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ABSTRACT 



A comparison of the properties of non-modifying and self-modifying 
programs leads to the definition of independent and dependent instruc- 
tions. Because non-modifying programs contain only independent instruc- 
tions, such programs can be analyzed by a straight forward, two-step 
analysis procedure. First, the program control flow is detected; second, 
that control flow is used to determine the program data flow or data 
processing. However, self-modifying programs can also contain dependent 
instructions, and the program control flows and data flows exhibit 
cyclic interaction. This cyclic interaction suggests the use of an 
iterative or a relaxation analysis technique. The initial step in the 
relaxation procedure determines a first approximation to control flow; 
the second step then finds a first approximation to data flow. These 
two stAps are repeated until a steady-state condition is reached. 

Algorithms for implementing the first iteration are presented. These 
algorithms are capable of analyzing programs which modify their control 
and processing instructions during the course of execution. In addition, 
data structures are described \rt»ich permit the construction of functional 
expressions for the data flow or information processing. Finally, actual 
output flowcharts of self-modifying progrMM are displayed. 
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CHAPTER 1 



SUMMARY 



This chapter outlines the organization of this thesis. 

The second chapter is an introduction to automatic program analysis 
by digital computer. Automatic program analysis is defined as the 
construction of a flowchart from an original source program without human 
assistance. Development of such an analysis capability is motivated 
by its possible use as a documentation and debugging tool. The history 
of automatic program analysis is presented. The purposes, objectives, 
scope, and restrictions of the thesis are stated. 

The third chapter presents the major problems of analyzing programs 
which modify themselves, A comparison of the properties of non- 
modifying and self-modifying programs leads to a statement of the 
general analysis problem and a general analysis procedure. 

The fourth chapter discusses the major techniques used in the 
general analysis procedure. The solution philosophy required for a 
successful analysis is stated. The general organization of the analysis 
system is outlined. Finally, a more detailed description of the indivi- 
dual analysis techniques is given. 

The fifth chapter displays the results of applying the existing 
analysis system to example programs. The layout and symbols of the output 
flowcharts are explained. Automatically produced flowcharts of programs 
containing particular analysis problems are presented. 



The sixth chapter summarizes and evaluates the specific results 
shown in the earlier chapters and discusses reasonable extensions 
of these results , 

The first appendix contains the general flowcharts of the analysis 
system subroutines. The second appendix displays output flowcharts 
produced by applying the analysis system to some of its own subroutines, 



taiAPTER 2 



INTRODUCTICKl 



This chapter is an introduction to automatic program analysis. 
First, the general problem of such analysis is presented, and includes 
a discussion of what automatic program analysis involves and why it is 
useful. Finally, the purpose, objectives, scope, and restrictions of 
this thesis are given. 



2.1 MOTIVATION 

In the early days of computer development, a detailed step-by- step 
machine- language program, i.e. numerical code, had to be written before 
a computer could be used to solve any problem. Because writing each 
new program in machine- language required excessive coding and debugging 
time, special programming aids were devised. Today, all machines have 
assemblers that permit the programmer to use symbolic operation codes 
and symbolic addresses. In addition, debugging packages and memory- 
dun^ routines help the program tester reduce debugging and testing .time. 
Finally, general-purpose languages, such as FORTRAN and MAD, enable in- 
experienced programmers to write programs without worrying about machinie- 
language errors. 

All of these programming aids are designed to help the programmer 
write a new routine, but are of restricted use in understanding or 



modifying an existing program even by its original author. In such a 
situation there is no substitute for adequate, clear, and pedagogically 
meaningful documentation of the intent and details of the programming 
algorithms. In the absence of such information, a user would struggle 
through the code to convert the existing program back into a block 
diagram or a flowchart. After the flowchart was reconstructed, the 
programmer could begin to understand both the function and algorithms 
of the routine as the sum of its parts. During such a reconstruction, 
a human programmer performs many tasks which could be automated; and 
thus, major portions of such automatic analysis could be performed by 
the computer. 

Automatic program analysis can clearly be applied to any aspect of 
producing pedagogically meaningful program documentation. For our pur- 
poses, we shall consider the construction of an accurate and concise 
flowchart from an original assembly- language source program without 
human assistance to represent a useful form of such information. This 
flowcharting procedure must produce the flowchart "boxes" with their 
sequential processes, and all such procedures must be interconnected. 
The flowchart boxes and interconnections represent the control flow of 
the program, i.e. the program instruction execution sequence. The 
functional relationships inside the flowchart boxes express the data 
flow of the program, i.e. the program information processing. Flowcharts 
are generally accepted as the sine qua non of documentation procedures. 
The major difficulties in machine generated flowcharts (over and above 



the sheer difficulty of the problem) are no different from those en- 
countered in hand generated ones. The more compact, concise, and 
meaningful the document, the greater the departure from machine and 
processing detail; and thus the more reasoning and abstraction required 
of the "analyst" and less of the user. Results of this automatic 
analysis even in a somewhat detailed form would be useful either as a 
debugging tool or as a documentation tool. 

As a debugging tool, the analysis program could analyze and display 
all possible execution paths, not just those that might be executed 
during the testing session. At the same time, the analysis program 
could call attention to any obvious program inconsistencies, before the 
debugging and testing sessions began. 

As a documentation tool, the analysis program could automatically 
provide final flowcharts for program documentation. This would allow 
the programmer to spend more of his time generating program code and 
less time documenting code. If flowcharts were prepared automatically, 
it would be easy to have an up-to-date version immediately after code 
corrections or additions were made. Also, a current flowchart would 
help reduce coding interruptions due to programming staff changes. If 
the results of automatic analysis were presented in a standardized 
mathematical form, it should be possible for a non-programmer with a 
general mathematical background to understand the algorithm and comprehend 
its implications. Finally, automatic program analysis should increase 
the human capability for understanding large programmed systems, by 



raising the level at which the human being assumes an analytic role. 

Besides the direct use of program analysis for debugging and docu- 
mentation, there are problems which can build on the results of such 
analysis. The solution of these problems requires an understanding of 
the interaction between programming languages and the execution of their 
generated machine code. Examples of three such problems are given, and 
the following discussion includes a statement of the problem and justi- 
fication for its solution. 

The development of large, interactive digital systems has made the 
estimation of program execution time less reliable (13). In a time- 
sharing system the operations manager cannot predict the throughput of 
his system, just as in a large military command-and-control system the 
commander cannot ascertain the information input conditions which will 
saturate his facility. A better understanding of the relationship between 
a programmed system and its machine execution requires a knowledge of 
execution times and storage requirements as a function of the program. 
With such data, a system analyst can decide what improvements need to be 
made and what improvements can be made. 

Today, it is still accepted that programs which are to be used re- 
peatedly should be written in machine-language, while those used just 
now and then could be written in a general -purpose coiipiler language. 
Thus, it is possible to pay for higher programming costs with the 
savings from machine-time expenses. However, this balance can shift 
because of a shortage of assembly-language programmers. Since there has 
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always been a shortage of capable programmers, why not develop an 
automatic machine-code-optimization procedure that could be used either 
during or after the compilation of a program (10). Thus, relatively 
efficient machine code could be generated by relatively inexperienced 

programmers . 

The last example concerns the reprogramming effort required by a 
change of machines. At present, this usually means converting to a new 
language. However, future system managers will be concerned not only 
with changes in machine language, but also changes in machine structure 
(e.g., from single processing to multiprocessing). If the switch is to 
be worthwhile, a manager must take advantage of the new structure, and 
he is faced with an inevitable reprogramming task. 

Also, the system manager would like to have his users or customers 
take advantage of his new facilities. However, at the same time he must 
not increase a user's cost per unit of processing. The answer to this 
problem is to provide an automatic reprogramming system which can convert 
from one language to another and still increase efficiency by taking 
advantage of all the new features which prompted the machine change (9). 
Although hopefully a clear case has been made for the desirability 
of machine program analysis, its feasability, practical utility, and 
difficulty of realization are far from clear. Utility assessment must 
await availability, and the problem is far from trivial. In fact it is 
the impossibility of finding a complete, closed form solution to the 
problem of program analysis (a known consequence of Turing machine theory) 



that has in part impeded the needed theoretical interest in the problem. 
Such applied work as has been noted in the literature is scattered and 
is far short of the requirements for even a rudimentary flowcharter. 



2. 2 HISTORY 

The purpose of this section is to review the literature that has 
appeared in the area of program analysis. The review is intended to 
show what has been done so that the context of this thesis may be seen. 
This presentation is divided into four parts: Directed-Graph Theory as 
Applied to Program Analysis; Program Analysis of Compiler- Language Source 
Programs; Program Analysis of Machine-Language Source Programs; and the 
Presentation of Program Analysis Results via Flowcharts. The work which 
we will describe is generally much too restrictive to be useful for the 
patterns of assembly-language coding which are generally utilized. 



2.2.1 Directed-Graph Theory 

A digital computer program can be represented by a directed-graph 
model i if all control paths are known ab initio. Nodes of the graph 
represent blocks of code, and branches of the graph represent control 
paths. With such a model, results of classical directed-graph theory 
can be applied to the program analysis problem, in the sense of pre- 
dicting connectivity between arbitrary nodes. 



R. T. Prosser (11), in work done in 1959, describes the analysis of 
directed graphs by the use of boolean matrices. Two boolean matrices 
are associated with each graph: the first is called the connectivity 
matrix, and contains the topological structure of the diagram; the second 
is called the precedence matrix, and contains the precedence relations 
of the graph. 

The connectivity matrix is an n by n boolean matrix, A = ^^i*^' 

where n is the number of program blocks and a "1 if program block j 

is just preceded by program block i. The precedence matrix is an n by n 

boolean matrix, B = (b. .), which is derived from the connectivity matrix 
m ij 

by performing elementary matrix computations on A exactly m times. De- 
pending on the operations used, b = 1 can indicate that it is possible 
to proceed from block i to block j in either . exactly m steps or at most 
m steps. 

C. V. Ramamoorthy (12), in work done in 1965, uses the connectivity 
matrix and precedence matrices to determine the structural characteristics 
of the program represented by the boolean matrices. He presents algorithms 
for detecting blocks which cannot be reached from the starting block; 
for finding which blocks are included in at least one loop; for par- 
titioning a graph into its unconnected subgraphs; and for determining 
the entry and exit blocks. Obviously, these determinations are of only 
incidental interest in understanding a procedure or deriving its flow- 
chart. For a general review of graph theory, see C. Berge (1). 
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2.2.2 Program Analysis of Compiler-Language Source Programs 

L, Krlder (8), In work done In 1964, describes an algebraic repre- 
sentation of the control flow of a computer program and presents an algo- 
rithm for manipulating such a representation into a form which could be 
used to draw a flowchart. The algorithm works on the assumption that 
the principal information about program flow is contained in Its loop 
structure. The algorithm also requires that all possible destinations 
of all transfer instructions must be known in advance. Thus, this pro- 
cedure can only be used on algebraic source- language programs. Such a 
"pattern of code" is far more restrictive than is utilized in assembly- 
language programming. 



2.2.3 Program Analysis of Machine-Language Source Programs 

L. M. Halbt (3), in work done in 1959, describes a program, the 
FLOWCHARTER, which automatically produces flowcharts of programs whose 
instructions are fixed and not modified or calculated during execution. 
The output of the FLOWCHARTER is a set of flowcharts showing various levels 
of detail, where each part of a chart is shown in more detail on a succeed- 
ing chart. The FLOWCHARTER is divided into four main parts: preprocessing, 
flow analysis, coiiq>utatlon stimmary, and output. 

The preprocessors transform input source language Instructions into 
an Internal language. This permits the FLOWCHARTER to handle different 
source languages by simply using the proper preprocessor. The flow 
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analysis program determines what information goes on each flowchart 
level. This routine first determines individual blocks and then groups 
the smaller blocks together into larger blocks. The computation summary 
program determines, for each block, which cells are used in input/out- 
put, which cells are used in calculations, and which cells are cal- 
culated. No functional relationships are derived; only the variable 
names are listed. The output program prints the various flowcharts. 

H. M. Teager, in an unpublished work, developed a cross-referencing 
program. The input of the program is a 709 FAP source- language program, 
while the output is a program listing plus cross-reference information. 
For each instruction location, the cross-reference information indicates 
the location of all instructions in the program that might effect the 
given instruction. For example, if an instruction changes or uses the 
contents of a cell, all locations which similarly modify or use that 
cell are listed beside the given instruction. Although helpful, sometimes 
the sheer volume of output makes the information useless. 



2.2.4 Presentation of Program Analysis Results 

G. Hain and K. Hain (4) have developed a program which will draw 
flowcharts. The blocks of the chart are positioned so that logically- 
close blocks are physically close, and there is a minimum number of 
connecting- line crossings. Likewise, W. Sutherland, in an unpublished 
work, used the SKETCHPAD program developed by I. Sutherland (15), to 
display flowcharts. In both of these works, output presentation was 
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the major concern, and the necessary machine analysis was assumed to 
have been derived by other means. 



2.3 PURPOSE AND SCOPE OF THIS THESIS 

This paper has two purposes. The first is to present algorithms 
for analyzing programs which modify their control and processing in- 
structions in the course of execution. Examples of such self -modi- 
fication are computed changes in operation code or operand address of 
instructions. The second purpose of this paper is to present data 
structures which will permit a functional expression of program data 
or information processing. These algorithms and data structures were 
utilized in a program analysis system which produced data and control 
flowcharts from assembly- language code. Even though the procedures and 
data structures were developed for a specific computer and its assembly- 
language, the results are of general theoretic and practical interest. 
The machine incorporates all of the most sophisticated operations of 
any existing machine short of a true multiprocessor, and thus, there 
are no major "surprises" to be expected from minor perturbations in the 
common structure of forthcoming machines in the near future, whether 
more or less powerful. 

The analysis and display procedures are general in scope; the con- 
cepts apply to all machines and all programs. For purposes of experimen- 
tation, the analysis and display algorithms were written for the IBM 
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7094 single-address machine (5) and the FAP assembler language (6). 
Input to the analysis program is the BCD listing produced by the FAP 
assembler. Output from the analysis program is a flowchart, where 
block interconnections show the program control flow and symbolic 
functional expressions inside the blocks show the program data or 
information flow. In addition, pertinent cross-reference information is 
given beside each block. This information permits a human user to 
begin analyzing the program at a more sophisticated level if the auto- 
matic procedures break down. Sufficient routines have been vritten 
to validate the proposed analysis algorithms and evaluate the results 
of the analysis programs. 



13 



■ •■^V^':-iff^:^jSf^s.'r^y^ 



Chapter 3 
A DISCUSSION OF THE ANALYSIS OF SELF-MODIFYING PROGRAMS 

The purpose of this chapter Is to Introduce the major problems of 
automatic analysis of self-modifying programs. First, a comparison of 
the properties of non-modifying and self-modifying programs with respect 
to data and control flow leads to a statement of the general analysis 
problem. Second, the general solution procedure of successive approx- 
imations utilized to solve this problem is outlined. Third, the problems 
Introduced by the solution procedure are discussed. Finally, examples 
of self-modifying programs further illustrate the analysis problems. 
In the description to follow, moderate familiarity with assembly-language 
programming and the specific mnemonics and conventions of IBM's FAP will 
be assumed (5 and 6). 



3.1 THE GENERAL ANALYSIS PROBLEM 

Before the general analysis problem is stated, it would be good to 
review the special case of programs which do not modify themselves. 
This review describes the special property of non-modifying programs 
which permits a straight- forward, direct analysis procedure. 

If a program is non-modifying, the set of all possible outcomes 
for each Instruction is a function of the instruction Itself and is 
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independent of all other program instructions. For example, an absolute 
transfer instruction, TRA Y, is an independent instruction because all 
of its outcomes are determined by the instruction itself. On the other 
hand, a tagged transfer instruction, TRA Y, 1, is a dependent instruction 
because its outcomes are a function of the contents of the index register 
and thus the instructions and data which affected it. There is a wide 
class of such dependent instructions which must be treated in the general 
case. 

The independence property of non-modifying programs permits a 
straight- forward, two-step analysis procedure. First, the program con- 
trol flow is determined by finding the outcome sets of all the transfer 
or control instructions. These results are used to draw the flowchart 
box outlines and Interconnections. Second, the program data flow is 
determined by finding the outcome sets of all the information processing 
instructions. These results are then processed as a function of the 
control flow to produce the symbolic functional expressions for inside 
the flowchart boxes. In summary, the independence property permits a 
two-step analysis procedure because the control flow can be found with- 
out regard to the data flow. 

However, if a program is self -modifying, the above two-step analysis 
procedure cannot be used because it assumes instruction independence. 
If a program contains dependent Instructions, such as a tagged transfer 
instruction, the control and data flows are a function of each other. 
The outcome set of a tagged transfer is a function of the index register 
loading instruction, but the set of index loading instructions can be a 
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function of the outcomes of the tagged transfer instruction itself. 
Because of this control flow - data flow interaction, a new analysis 
procedure is needed for self-modifying programs. To be feasible, such 
a procedure must perforce fall short of a con^dete dynamic analysis of 
the program's execution, and instead consider just a few static itera- 
tions. 



3.2 THE GENERAL ANALYSIS PROCEDURE 

If the control flow and data flow of a self-modifying program are 
to be determined, a procedure must be found for handling the control 
flow - data flow interaction cycle. This cyclic behavior of self- 
modifying programs suggests the use of an iterative or a relaxation 
solution technique. 

Since data flow is always a function of control flow, the initial 
step in the relaxation solution procedure should determine a first 
approximation to the control flow. The second step would then determine 
a first approximation to the data flow as a function of control flow. 
The first two steps would be repeated until all the outcomes of all the 
dependent instructions have been found and the analysis results have 
reached a steady-state condition. Only then can the control flow results 
be used to construct the flowchart box outlines and interconnections, 
and the data flow results to produce the symbolic functional expressions 
for inside the flowchart boxes. 
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3.3 THE RELAXATION SOLUTION PROBLEMS 

The relaxation solution procedure is the iterative application of 
the two-step analysis process for non-modifying programs. Because of 
the control flow - data flow interaction cycle of self -modifying pro- 
grams, both steps must be modified. The purpose of this section is to 
review the problems solved by the two-step procedure and to show how this 
process must be modified to solve the relaxation problems. 



3.3.1 Control Flow Modifications 

Control flow represents the program instruction execution sequence 
and is used to construct the flowchart box outlines and interconnections. 
This execution sequence can be modeled by a directed graph where nodes 
represent flowchart boxes and directed branches represent box inter- 
connections. More specifically, let each node of the control graph re- 
present a program block. Let a block be defined as a sequential set of 
instructions between a transfer entry point and the next transfer entry 
or exit point. Thus, a block is completely processed once its first 
member instruction is executed. Therefore, a directed graph whose 
nodes represent program blocks displays only execution sequence infor- 
mation. The major control flow graph construction problems are breaking 
the program into blocks and then interconnecting those blocks in proper 
sequence. Now, the differences between finding the control graph of a 
non-modifying program and of a self-modifying program are discussed. 
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The first control graph construction step is the detection of all 
control or transfer instructions. Each of these instructions generates 
a set of outcomes, i.e. entry and exit points. For non-modifying pro- 
grams, all entry and exit points can be determined from the individual 
control instructions. Figure 3.1a shows examples of entry and exit 
points generated by independent control instructions. However, in the 
case of self-modifying programs, some entry and exit points cannot be 
imnediately determined because of dependent instructions. Figure 3.1b 
shows an exaiiq>le of such a dependent instruction, the tagged transfer, 
where the entry points cannot be determined from the transfer instruc- 
tion itself. Therefore, the control graph construction procedure must 
be modified to handle missing entry and exit points. 



Figure 3.1 - Entry and Exit Points 
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a. Entry and Exit Points Generated by Independent Instructions 
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b. Entry and Exit Points Generated by a Dependent Instruction 
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In the second construction step the entry and exit points are pro- 
cessed to determine the program blocks. In the non-modifying case, the 
application of the block definition is straight forward. In the self- 
modifying case, some entry and exit points are initially missing. There- 
fore modification to the block definition is required so that a first 
approximation to the program blocks can be made. 

The third construction step interconnects the blocks or nodes in 
the proper execution sequence. In the case of non-modifying programs, 
all interconnections can be made because all control Instruction out- 
comes are known and blocks are completely defined. In the case of self- 
modifying programs, some block connections cannot be made because of in- 
c(niq>lete control instruction outcome sets. Therefore, the block inter- 
connection procedure must be modified so that assumed control graph 
branches can be inserted at points where incomplete outcomes occur. 

The final construction step places the control flow information 
into some data structure. The control flow information of a non- 
modifying program can be stored in a rigid data structure because its 
Information is completely known and is not changed by later analysis. 
However, the data structure used to represent the self -modifying program 
needs to be flexible because it contains information which might be up- 
dated by later analysis results. 
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3.3.2 Data Flow Modifications 

Data flow represents the data or information processing performed 
by the program and is used to generate the functional expressions for 
inside the flowchart boxes. This data processing can be modeled by a 
directed graph where the nodes represent cell references or operators 
and the directed branches represent the processing sequence. A cell 
is either a memory location or a central processor register. An operator 
is a machine operation, such as ADD or MULTIPLY. 

The data flow graph removes the sequential constraint imposed by 
the digital computer. This removal permits a better presentation of 
the program's data processing algorithm by removing references to tem- 
porary storage and displaying parallel processing paths. The data flow 
is an implicit function of the control flow because control flow determines 
the order of instruction execution and thus the arrangement of data flow 
graph nodes and branches. Figure 3,2 shows a simplified program and its 
data graph. 



Figure 3.2 - A Data Flow Graph 
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The major data flow graph construction problems are determining vhere 
and how each cell is referenced and then interconnecting those references 
in the proper sequence to form the data flow graph. Now, the differences 
between finding the data flow graph of a non-modifying program and of a 
self-modifying program are discussed. 

The first data graph construction step is the detection of all in- 
structions which change or use data or information. Each of these instruc- 
tions generates a set of outcomes, i.e. a set of references to various 
cells. In the case of non-modifying programs, the reference outcomes of 
each instruction can be found from the instruction itself. While in 
the case of self-modifying programs, some outcomes may not initially 
be known. For example, the cells referenced by the dependent instruction, 
CIA **, cannot be determined until after the actual address of the in- 
struction itself has been found. Thus, the reference detection procedure 
must be modified to handle dependent data referencing instructions. 

The second construction step determines the effect of each cell 
reference. The reference effect can be found from the instruction itself. 
Let a reference which changes the contents of a cell be known as an active 
reference. Let a reference which only uses the contents of a cell be 
known as a passive reference. For example, the CLA A instruction makes 
a passive reference to A and then an active reference to the accumulator, 
AC. The ADD B instruction first makes a passive reference to cells B 
and AC and then makes an active reference to the AC. 
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The third construction step determines the processing sequence of 
the data references. When a program makes a passive reference to a cell, 
it obtains the contents placed there by that cell's latest executed 
active reference. In a static analysis It Is only possible to find all 
possible latest active references for each passive reference; only a 
dynamic or interpretive process can detect the single latest active 
reference. The latest reference set for each passive reference can be 



Figure 3.3 - Latest Reference Sets 






a. Dual Search Path b. Loop Search Path c. Parallel Search Path 
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found by searching back through the program as a function of the control 
flow until all control paths are terminated by an active reference. 
Figure 3.3 shows exanq)les of latest reference sets. The dashed arrows 
Indicate latest references produced by passive reference - active ref- 
erence matches. In the case of non-modlfylng programs, all data ref- 
erences are known and control flow is conpletely determined. Such Is 
not the case for self-modifying programs. Since Individual passive ref- 
erences can be missing, not all the latest reference sets may be found. 
Since Individual active references can also be missing, latest reference 
searches may be Improperly terminated. Finally, since control flow paths 
can be missing because they are functions of yet to be determined data 
flow, latest reference searches may be incorrect. Thus, the latest ref- 
erence searching procedure must be modified to handle dependent instructions. 

The final construction step places the latest reference information 
into a data structure which permits the generation of symbolic functional 
expressions for inside the flowchart boxes. The data structure must 
allow the analysis program to carry latest reference expressions forward 
to each passive reference that needs them. The data structure must 
also permit the analysis program to conqpress and simplify those func- 
tional expressions. Figure 3.4 shows exan^les of functional expressions. 
The second expression in each example is preferred. In the non-modifying 
program, all control paths and data references are known. Therefore, 
the latest reference structure can be rigid, and the functional 
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Figure 3.4 - Functional Expressions 
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expressions are final. In the self -modifying program case, some 

control paths and data references can be missing. The latest reference 

data structure must be flexible because its information may be changed 
in later iterations. 



3.4 DEPENDENT INSTRUCTIONS 

Because of the large number of machine instructions and assembly 
pseudo-operations in the FAP assembly-language, it is necessary to limit 
the number and format of dependent instructions which the automatic 
analysis program will initially handle. The purpose of this section is 
to list and describe these dependent instructions. 



3.4.1 The Transfer Switch 

The first example of a control flow - data flow interaction problem 
is the transfer switch. A transfer switch occurs when a program changes 
its er.ecution path by replacing or modifying its own instructions. 
Figure 3.5a shows one of the many forms of the transfer switch. In this 
example, the transfer instruction at location A is picked up and stored 
over an existing instruction at location B. When the program next 
reaches location B, control will be switched to location C. The transfer 
instruction at location A is dependent because its outcome is a function 
of its storing instruction. In this example the control flow problem of 
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Figure 3.5 - D^endent Instructions 
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determining which location receives control from the switch Interacts 
with the data flow problems of detecting the switch and determining Its 
location. 



3.4.2 The Subroutine Call and Return 

The second exanq>le of control flow - data flow Interaction Is the 
subroutine call and return. Figure 3.5b shows Its general form. In 
this example, the subroutine Is called by the calling Instruction, TSX. 
The calling Instruction Is followed by a set of locations which form the 
subroutine calling sequence. The calling sequence set may be empty. 
The calling sequence Is followed by a set of subroutine return locations. 
I.e. locations to which the subroutine transfers control when It Is 
finished. Here too, the return set may be empty. The subroutine call 
and return sequence are dependent because Its outcomes are a function 
of the subroutine Itself. In this exanq;>le the control flow problems of 
determining the length of the calling sequence and the number of return 
locations Interact with the data flow problem of finding where and how 
the subroutine calculates Its return. 



3.4.3 The Calculated Transfer 

The third example of a control flow - data flow Interaction Is the 
calculated transfer Instruction. A calculated transfer occurs when a 
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transfer instruction calculates its possible outcomes, i.e. the set of 
locations to which it transfers control. Figure 3.5c shows one of the 
forms of the calculated transfer, the tagged transfer. The tagged 
transfer uses its address and tag to determine which location receives 
control. Thus, the tagged transfer is a dependent instruction because 
its set of outcomes are a function of the index loading instruction. 
In this example the control flow problem of finding the set of locations 
which can receive control from the tagged transfer interacts with the 
data flow problem of finding where and how the index register is loaded. 



3.4.4 The Modified Instruction 

The fourth example of a control flow - data flow interaction is the 
modified instruction. A modified instruction occurs when a program 
modifies or changes a portion of an existing instruction. Figure 3,5d 
shows one of the many forms of the modified instruction. In this example 
the address portion of the instruction at location B is changed by the 
previous instruction. The instruction at location B is dependent because 
its outcome is a function of its modifying instruction. In this example 
the data flow problem of determining the new address portion of location B 
interacts with the control flow problem of finding which locations change 
the address portion of location B. 
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3.4.5 The Indirect Address 

The fifth example of control flow - data flow interaction is the 
indirect addressed instruction. Figure 3.5e shows one of the forms of 
the indirect addressing. In this example the instruction at location A 
uses the address portion of location B to determine which location it 
references. The indirect address instruction at location A is dependent 
because its outcomes are a function of the instruction which last changed 
the address portion of location B. In this example the data flow problem 
of determining the address portion of location B interacts with the 
control flow problem of finding where that address was last changed. 



3.4.6 The Tagged Address 

The last example of control flow - data flow interaction is the tagged 
address instruction, A tagged address occurs when an instruction uses 
an index register to calculate its effective address. Figure 3,5f shows 
an example of a tagged address instruction. In this example the instruction 
at location A uses index register one to calculate which location is 
picked up from the table at location B, The tagged address instruction 
is dependent because its outcome is a function of the index loading in- 
struction. In this example the data flow problem of deciding which lo- 
cation is picked out of the table interacts with the control flow problem 
of determining where the index register was last loaded. 
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CHAPTER 4 
THE ANALYSIS SOLUTION 

In the previous chapter a comparison of the properties of non- 
modifying and self-modifying programs led to the definition of 
independent and dependent instructions. The dependent instructions of 
self-modifying programs caused control flow - data flow interaction 
requiring an iterative analysis procedure. The problems introduced 
by iteratively applying the straight- forward, two-step analysis pro- 
cedure for non-modifying programs were discussed. 

This chapter presents the approximation procedures used by the 
first iteration to bootstrap itself through the control flow - data 
flow interaction cycle discussed in Chapter 3. First, the solution 
philosophy required for a successful analysis is stated. Second, the 
general organization of the first iteration is outlined. This outline 
describes the data acquisition and data processing sequence and shows 
the use of intermediate data flow analysis results to improve control 
flow approximations and vice versa. Finally, a more detailed presenta- 
tion describes how the control and data flow steps handle the dependent 
instructions listed in Chapter 3. 
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4.1 THE SOLUTION PHILOSOPHY 

If an automatic program analysis system Is to be successful. It 
should be able to analyze long, core-length programs, such as assemblers 
and compilers. When long programs are analyzed, the analysis system 
may generate Intermediate data tables that are at least two or three 
times as long as the original input program. Because it may not be 
possible to retain all of the intermediate tables in core, these 
results should be placed on external lists. Because of these large, 
external data lists, the analysis procedure should wherever possible 
consist of sorting, merging, and scanning. Any searching of these 
lists or other data structures should be avoided or delayed whenever 
possible. If this data processing philosophy Is to be successful, 
a set of temporary result lists and a processing sequence must be 
developed . 



4.2 THE FIRST ITERATION 

Because the first iteration uses intermediate data flow analysis 
results to Improve its control flow approximations and vice versa, a 
general outline of the first iteration organization would be helpful 
before the detailed dependent instruction solutions are discussed. 
The first iteration is divided into four parts: Data Gathering, Data 
Processing, Data Reduction, and Function Generation and Output. The 
organization and Information processing are also graphically displayed 
in Figure 4.1 and Figure 4.2. 
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Figure 4,1 - The First Iteration Organization 
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Figure 4.2 - The First Iteration Information Processing 
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4.2,1 Data Gathering 

The first phase transforms the input program from a set of assembly- 
language instructions into a set of temporary data lists. The input 
program is scanned one line at a time. First, the line is decoded 
and interrogated for such information as octal instruction, its 
assigned memory location, BCD instruction operation code, and absence 
or presence of a tag or indirect address. The assigned memory location 
and octal instruction were produced by the FAP assembler. They are 
used by the analysis program as bookkeeping aids for generating list 
or table entries, e.g. the assigned memory location is used in each 
table entry so that later analysis phases can determine which 
instruction originally generated the entry. The BCD operation code 
is used to decode the instruction because it permits some "interpreta- 
tion" of programmer intent, e.g. data and storage pseudo-operations 
can be distinguished from executable instructions. Tagged and indirectly 
addressed instructions are detected so that special analysis procedures 
can be initiated. 

Second, entries are added to the various data lists according to the 
BCD operation code. For transfer instructions, entries are added to 
the various Transfer Lists, e.g. the Entry and Exit Point Lists. For 
referencing instructions, entries are added to the Active and Passive 
Reference Lists. For data generation pseudo-operations, entries are 
added to the Data List. For storage generation pseudo-operations, 
entries are added to the Storage List, etc. Each list entry uses 
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information decoded from the original instruction, e.g. if the instruction 
is tagged or indirectly addressed, special flags are set in its entries 
so as to alert later analysis phases. 



4.2.2 Data Processing 

The second phase determines program properties by using data 
processing techniques on the temporary data lists. In general, the 
lists are sorted to place them in proper order and then sequentially 
scanned to detect program properties. 

First, general program properties are detected. Transfer Lists 
are sorted and scanned to determine first approximations to subroutine 
return points , These new entry and exit points are added to the Entry 
and Exit Point Lists. The Reference Lists are sorted and scanned to 
detect which portions of each cell are actively referenced; which cells 
are only passively referenced, i.e. constants; and which cells are only 
actively referenced, i.e. results. 

Second, special program properties are determined. Modified 
Instructions are detected by comparing each Active Reference List 
entry with those on the Data and Storage Lists. If a proper match is 
not found, the actively referenced location is flagged as a possible 
modified instruction. Possible transfer switch locations are found 
by comparing each entry on the Passive Reference List against all 
entries on the Exit Point List. A match indicates a passive reference 
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to a location which contains a known transfer instruction. The matching 
Exit Point entry and the Reference Lists are then used to find a first 
approximation to the outcomes of the transfer switch. The new outcomes 
are added to the Entry and Exit Point Lists, 



4.2,3 Data Reduction 

The third phase transform^ the processed temporary data lists into 
more convenient data structures. Generally, this involves sorting the 
lists into proper order and then placing each list entry into a new 
data structure by either scanning or searching the list. 

First, the Transfer Lists which contain sorted entry and exit 
point information are transformed into Control Tables which represent 
the approximated control flow graph. The Entry and Exit Point Lists 
are used to break the program into blocks and to interconnect those 
blocks. This topological information is then represented in the Control 
Tables. Finally, the Control Tables are interrogated to detect unreachable 
blocks and to approximate and to insert missing control branches. 

Second, the Reference Lists are resorted and transformed into 
Reference Tables by associating each Active and Passive Reference List 
entry with the block in which it occurs. Next, the "latest reference 
set" for each passive reference is found by searching the Control and 
Reference Tables, Finally, the latest reference information is placed 
into a suitable data structure. 
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4.2.4 Function Generation and Output 

The fourth phase transforms the Latest Reference Tables into 
functional expressions and places those expressions in a suitable data 
structure for final output. 



4.3 THE CONTROL FLOW SOLUTIONS 

This section presents the solution techniques used to solve the 
control flow problems discussed in Chapter 3. First, the control flow 
graph structure is presented so that the end result is known in advance. 
This discussion includes the desired structure properties and a structure 
which incorporates those properties. Second, the solution techniques 
used to bootstrap through the dependent instruction interaction cycle 
are presented. These techniques include detecting the entry and exit 
points, determining the program blocks, and interconnecting the blocks. 



4.3.1 The Control Graph Data Structure 

The data structure which contains the control flow information must 
have two characteristics. First, the structure must permit forward and 
backward movement in the control flow graph. Forward, because the program 
is executed in that direction; backv;ard, because the latest reference 
search is easier to program for that direction. Second, the structure 
must permit expansion and contraction of the control flow graph. Expansion, 
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because later analysis Iterations may detect new blocks; contraction, 
because those same iterations may wish to rejoin blocks. 

A modification of Ross's plex (14) produces a data structure which 
incorporates the proper characteristics. The complete structure will be 
referred to as the Control Tables and is composed of three separate 
tables: the Topology Table, the To Table, and thp From Table. Figure 4.3 
shows the general component of each of these three tables. 

Figure 4.3 - The Control Tables 
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The Topology Table serves as the "card catalogue" for analysis 
results. When the analysis program needs information about a given 
block, it can be found through the Topology Table once the Block Number, 
I, is known. The Topology Table entries are numbered sequentially with 
the starting program block coming first, the second block second, etc. A 
Topology Table entry is composed of seven sequential words, the first 
word contains the STARTing and ENDing location of the particular block. 
The second word is the "catalogue card" for the blocks which can be 
reached from the particular block. The left half contains the count 
of those blocks, and the right half points into the To Table where the 
Block Numbers of those reachable blocks are stored. The third word is 
the "catalogue card" for the blocks which can pass control to this 
particular block and is constructed similarly to the second word. The 
fourth through seventh words are reserved for data flow Information and 
will be discussed in a later section. 

The To Table contains a variable length entry containing the Block 
Number of each block reachable from the given block. Likewise, the 
From Table contains a variable length entry containing the Block Number 
of each block which can pass control to the given block. 



4.3.2 Detecting the Entry and Exit Points 

During the Data Gathering Phase, entries are added to the temporary 
Transfer Lists whenever a transfer or control type instruction is found. 
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If the data structure of these lists is to conform with the general 
solution philosophy discussed earlier, the structure must permit 
individual entries to be added as required but yet allow all entries 
to be processed as a group. 

These characteristics can be incorporated into two lists, the 
Entry Point List and the Exit Point List, The Entry Point List contains 
the entry point entries, and the Exit Point List contains the exit 
point entries. The format of the list entries is shown in Figure 4.4. 
The "f" portion of each entry retains information about the function 
or purpose of the transfer instruction which generated the entry, 
e.g. remembers that the instruction was an absolute transfer, a subroutine 
call, or a tagged transfer. The "Entry Point" portion of each entry 
contains the core location of the entry point. The "Exit Point" portion 
of each entry contains the core location of the exit point. 

Figure 4.4 - The Entry and Exit Point List Formats 
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Generating the Entry and Exit Point List entries involves detecting 
all control instructions and determining their outcome sets. The outcome 
of an independent control instruction can be determined from the 
instruction itself. Figures 4.5 and 4.6 show examples of list entries 
generated by independent instructions during the Data Gathering Phase. 
Note that, except in special cases which are discussed later. Entry and 
Exit Point List entries are made in pairs. This procedure facilitates 
breaking the program into blocks. However, there is a small but impor- 
tant percentage of control instructions which are dependent and whose 
outcome sets cannot be determined by the Data Gathering Phase. Now, 
three such dependent instructions are discussed to indicate how their 
Entry and Exit Point List entries are generated. 

Figure 4.5 - The Entry and Exit Point Entries of an Absolute Transfer 
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Figure 4.6 - The Entry and Exit Point Entries of a Conditional Transfer 
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The first example of a dependent control instruction is the Transfer 
Switch. Figure 4.7a shows how a Transfer Switch might occur in a program. 
During the Data Generation Phase, Entry and Exit Point List entries are 
made for the TRA C instruction, and Active and Passive Reference List 
entries are made for the CLA A and STO B instructions. During the Data 
Processing Phase, the analysis program detects a passive reference to a 
location containing a transfer instruction. In this case the Passive 
Reference List contains a passive reference to location A generated by 
the CLA A instruction, and the Exit Point List contains an entry at 
location A generated by the TRA C instruction. Thus, the Data Processing 
Phase knows that the CIA A instruction fills the accumulator with an 
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Figure 4.7 - The Transfer Switch 
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Figure 4.8 - Transfer Switch with Passive-Active Reference Separation 
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Instruction that passes control to location C. It also knows that the 
instruction is at location A and the "f" portion of its Exit Point List 
entry indicates an absolute transfer, TRA. The Data Processing Phase 
determines where the accumulator stores the transfer instruction by 
noting that the "next" passive reference to the AC after the active 
reference to the AC generated by the CLA A instruction is the STO B 
instruction. Therefore, since the STO B instruction actively references 
location B, the transfer instruction is stored into B. Because control 
can be split two ways at location B, two entry point - exit point pairs 
are added to the end of the lists as shown in Figures A. 7b and 4.7c. The 
"f" portions of these new entries indicate generation by a Transfer Switch. 
Note that care must be taken to determine whether or not the passive 
reference which picks up the transfer is separated from the active 
reference which stores the transfer by either an entry or exit point. 
If the references are separated, the "correct" active reference cannot 
be found until after the first approximation to the control flow has been 
determined, i.e., during the second iteration. Figure 4.8 shows such a 
case. The TEA N instruction is stored into location C, not Z. Finally, 
the Data Processing Phase must determine whether the transfer instruction 
which causes the switch can be executed in its original location. This 
is done by seeing if there is a data or storage pseudo-operation on the 
Data or Storage Lists in a location "just above" the location of the 
transfer instruction. If there is, the Entry and Exit Point List entries 
originally generated by the transfer are removed because the transfer 
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instruction "appears" to be included in a "data area" and is "probably" 
not executed in its original location. 

The second example of a dependent control instruction is the 
Subroutine Call and Return. Figure 4.9a shows how a subroutine call can 
occur in a program. Subroutine return points must be found so that the 
proper Entry and Exit Point List entries are made and the program can 
later be broken into the correct blocks. For analysis purposes, there 
are two types of subroutines. The first type is the external subroutine 
which is assembled separately from its calling program and need not be 
available for analysis. An external subroutine can be detected by a 
call which transfers control to a location in the Transfer Vector, 
i.e. a location before the first executable instruction. The external 
subroutine return information must be supplied as input information 
along with the original input program. This information is processed 
during the Data Gathering Phase and is used to generate Entry and Exit 
Point List entries. 

The second type of subroutine is the internal subroutine. It is 
assembled along with its calling program and is available for analysis. 
During the Data Gathering Phase, a Subroutine Return List containing 
internal subroutine calls and probable subroutine returns is constructed. 
A subroutine is usually called in the FAP language by a TSX instruction. 
A subroutine usually returns via a tagged, absolute transfer, such as a 
TRA "small constant", 4. When a TSX instruction is found, a call entry 
is added to the end of the Return List; when a probable subroutine return 
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Figure 4.9 - The Subroutine Call and Return 
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Figure 4.10 - The Subroutine Return List 
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instruction is found, a return entry is added to the end of the Return 
List. Figure 4.10a shows an example of a program; Figure A. 10b shows its 
Subroutine Return List; and Figure 4.10c shows its sorted Return List. 
Note that in the sorted list, the returns for each subroutine are grouped 
together under its entry point or starting location. This technique 
assumes that all instructions of each subroutine are sequentially 
grouped together, e.g. SUBl and SUB2 do not have any common instructions 
in Figure 4.10a. If subroutines do have common instructions, this 
approximation procedure produces invalid return points which must be 
corrected after the first approximation to control flow has been 
determined, i.e. in a later iteration. Figures 4.9b and 4.9c show how 
the entry point and exit point entries are added to the end of the lists 
for each subroutine call. 

The third example of a dependent instruction is the calculated 
transfer. Figure 4,11 shows how one form of the calculated. transfer, 
the tagged transfer, might occur in a program. Note that the tagged 
transfer in Figure 4.11 has a symbolic or relocatable address and is 
"probably" not a subroutine return. During the Data Generation Phase, 
only the location of the Exit Point is known, i.e. the location of the 
tagged transfer instruction. Therefore, only a single Exit Point List 
entry can be made and is shown in Figure 4.11c. Its "f" portion shows a 
tagged transfer, and its "Entry Point" portion is flagged as unknown. 
The problem of the missing entry points is passed on to later analysis 
phases. 
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Figure 4.11 - The Calculated Transfer 
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4.3.3 Determining the Program Blocks 

After the Data Generation and Data Processing Phases detect the 
control instructions and generate the Transfer List entries, the Data 
Reduction Phase uses the lists to determine the program blocks. First, 
the lists must be ordered. The Entry Point List is sorted on its 
"Entry Point" column; the Exit Point List is sorted on its "Exit Point" 
column. Second, the program is broken into blocks by sequentially 
scanning the two lists and recognizing the various entry and exit point 
patterns . 
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There are four different types of blocks which produce four 
different Entry and Exit Point patterns. These are: 

1. Blocks with both entry and exit points, 

2. Blocks with only exit points, 

3. Blocks with only entry points, and 

4. Blocks with neither entry points nor exit points. 

The patterns are recognized by detecting the occurrence of certain 
mathematical relationships between the "Entry Point" portion of the 
sorted Entry Point List entries and the "Exit Point" portion of the 
sorted Exit Point List entries. Each list has its own pointer which 
specifies the current entry on the list, e.g. the Entry Point List 
Pointer specifies the Current Entry Point. The term. Next Entry Point, 
refers to the next different entry after the current entry. Since both 
lists have been sorted, it is always true that the Next Entry Point be 
greater than the Current Entry Point. Likewise, the next Exit Point 
must be greater than the Current Exit Point. As the respective entries 
are processed, the pointers are moved down the lists. The recognition 
process is recursive, and the recognition expressions stated below 
assume that all entries and exits for the previous block have been 
processed. 
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1. The current block has both entries and exits: 

Current Entry « Previous Exit + "1" 
Current Entry < Current Exit 
Current Exit < Next entry 

2. The current block has only exits: 

Current Entry ?* Previous Exit + "1" 
Current Entry > Current Exit 

3. The current block has only entries: 

Current Entry " Previous Exit + "1" 
Current Exit > Next Entry 

4. The current block has neither entries nor exits: 

Current Entry f Previous Exit + "1" 
Current Entry < Current Exit 

Figure 4.12a shows a -flowchart outline which contains a block with 
both entries and exits. Block Q can be reached from location b and 
transfers control to locations 1 and y. Block Q starts at location j 
and ends at location k. Figures 4.12b and 4.12c shows the Sorted Entry 
and Exit Lists. If Block P has already been formed, then the arrows on 
the two sorted lists point to the current list entries. Block Q has 
both entries and exits because the list entries satisfy the first set 
of relationships shown above, i.e. j - 1 + 1, j < k, and k < 1. The 
START of Block Q Is j, and the END is k. Figures 4,12c, d, and e show 
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Figure 4.12 - A Block with both Entry Point and Exit Point Entries 
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the Control Tables for Block Q. Since there are two Exit List entries 
vith an "Exit Point" portion of k, there are two To Table entries, 
1 and y. Since there is only one Entry List entry with an "Entry 
Point" portion of j, there is only one From Table entry, b. In this 
example and those to follow, the entries in the To and From Tables are 
core locations, not Block Numbers, The core locations are replaced by 
Block Numbers after the program has been broken into blocks. 

Figure 4.13a shows a flowchart outline which contains a block with 
only exits. Block Q only exits to location y. (The entry at i + 1 can 
be missing because of a calculated transfer not generating its entry 
point entries during the Data Generation Phase.) Block Q starts at 
location i + 1 and ends at location j. Figures 4,13b and 4.13c show 
the Sorted Entry and Exit Lists, If Block P has already been formed, 
then the arrows on the two sorted lists point to the current list entries. 
Block Q has only exits because the list entries satisfy the second set 
of relationships shown above, i.e. k ?« i + 1 and k > j . The START of Block Q 
is i + 1, and the END is j. Figures 4.13c, d, and e show the Control 
Tables for Block Q, Since there is one Exit List entry with an "Exit 
Point" portion of j, there is one To Table entry, y. Since there are 
no Entry List entries with an "Entry Point" portion of i + 1, there are 
no From Table entries for Block Q, 

Figure 4,14a shows a flowchart outline which contains a block with 
only Entry List entries. Block Q receives control from location i, but 
transfers control directly to the next sequential block. Figures 4.14b 



52 



Figure 4.13 - A Block with only Exit Point Entries 
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and 4.14c show the Sorted Entry and Exit Lists, If Block P has already 
been formed, then the arrows on the two lists point to the current list 
entries. Block Q has only entries because the list entries satisfy the 
third set of relationships shown above, i.e. j = i + 1 and 1 > k. The 
START of Block Q if j, and the END is k - 1 . Figures 4.14c, d, and e 
show the Control Tables for Block Q. Since Block Q exits directly to 
the next block, an exit is inserted from location k - 1 to location k. 
Thus Block Q has one To Table entry, k. Note that Block R has two From 
Table entries, b and k - 1. Since there is one Entry List entry with 
an "Entry Portion" of j, there is one From Table entry, i. 

Figure 4.15a shows a flowchart outline which contains a block with 
neither entry not exit points. Figures 4.15b and 4.15c show the Sorted 
Entry and Exit Lists, If Block P has already been formed, then the 
arrows on the two lists point to the current list entries. Block Q has 
neither entries nor exits because the list entries satisfy the fourth 
set of relationships shown above, i.e. j 7* i + 1 and j < k. The START 
of Block Q is i + 1, the END is j - 1. There are no To or From Table 
entries . 



4.3.4 Interconnecting the Blocks 

In the previous section, techniques for breaking the program into 
blocks and constructing the Control Tables were described. Now, these 
tables must be checked to insure that the blocks have been properly 
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Figure 4.14 - A Block with only Entry Point Entries 
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Figure 4.15 - A Block with neither Entry Point nor Exit Point Entries 
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and totally interconnected so that all program blocks are used in the 
analysis. The purpose of this section is to discuss techniques for 
testing block interconnections, detecting isolated or improperly 
connected blocks, and correcting improper block connections. 

The program being analyzed must be assumed to be a "well connected" 
program where each program block can be reached from at least one of the 
program starting blocks, (A subroutine can have any number of starting 
blocks or entry points.) If a block cannot be reached from a starting 
block, there must be some reason for its isolation. Detecting isolated 
blocks first requires constructing a list of blocks which can be 
reached from one of the starting blocks and then determining which 
blocks are missing from this reachable block list. 

As each isolated block is detected, the reason for its isolation 
must be determined; and its Control Table entries corrected. If the 
block should be isolated, its Topology Table entry is flagged as such. 
However, if the block should not be isolated, the proper assumed 
connection branches must be inserted into the Control Tables to make the 
isolated block reachable from its true predecessor blocks. After the 
Control Tables have been corrected for the isolated block, a new list 
of reachable blocks is constructed; and the detection procedure is 
repeated. This detection and correction procedure is repeated until 
all blocks are either reachable or flagged as truly isolated. Because 
of the generality of assembly- language programming, there are many 
different reasons for isolated blocks. It is at this point that 
individual algorithms must be developed for each class of reasons. 
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Probably the most common reason why a block should be isolated 
is that it contains data or storage pseudo-operations and is not meant 
to be executed. (Of course, there will always be the programmer who, 
for reasons known only to himself, uses data or storage pseudo-operations 
to generate executable code.) Figure 4.16a shows the structure of a 
program containing such a block. If this type of block is found missing 
from the reachable block list, its reason for being isolated can be 
verified as follows. First, the Data and Storage Lists are scanned to 
see if they contain at least one entry whose program location places it 
within the isolated block. Second, the Control Table entry of the block 
preceding the isolated block is checked to see if it is terminated by 
a single absolute transfer. If both conditions are satisfied, the block 
is truly isolated; and a data or storage flag is set in its Topology 
Table entry. 

Another common reason why a block should be isolated is that it 
contains a subroutine calling sequence. Figure 4.16b shows the structure 
of a program containing such a block. If this type of block is found 
missing from the reachable block list, the To Table entry of the prece- 
ding block must show that it is terminated by a subroutine call. Because 
of the generality of assembly- language programming, a calling sequence 
can contain any type of instruction or pseudo-operation. At this stage 
of analysis, the isolated block can only be flagged as an assumed 
calling sequence. In a later iteration after the subroutine return 
approximations have been verified, the interaction between subroutine 
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Figure 4.16 - Isolated Blocks 
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calling sequences and subroutine returns can be used to verify the 
flagging of blocks as calling sequences. 

One common use of the calculated transfer is in a dispatch table. 
A dispatch table is a sequential set of blocks where the first block is 
terminated by a tagged transfer and the other blocks are terminated by 
an absolute transfer. The contents of the index register of the tagged 
transfer are used to determine which dispatch table block receives 
control from the tagged transfer block. Figure 4.16c shows a program 
containing a dispatch table. When the program is broken apart, the 
blocks in the dispatch table are formed as a function of exit points 
alone, because the entry points of the tagged transfer are missing. 
Thus, no connections are made between the tagged transfer block and the 
dispatch blocks. VJhen a reachable block list is constructed, the dispatch 
blocks and those connected to them are missing. Therefore, assumed 
branches must be inserted into the Control Tables to interconnect the 
tagged, transfer block and the dispatch blocks as shown by the dashed 
arrows in Figure 4.16c. These assumed branches permit the analysis 
program to reach the blocks which are connected to the dispatch blocks. 
In a later iteration after the set of possible index register values 
has been determined, the assumed branches can be verified. 



4.4 THE DATA FLOW SOLUTIONS 

This section presents the solution techniques used to solve the 
data flow problems discussed in Chapter 3. First, the data flow graph 
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data structure is presented so that the end result is known in advance. 
This discussion includes the desired structure properties and a structure 
which incorporates those properties. Second, the solution techniques 
used to bootstrap through the dependent instruction interaction cycle 
are presented. These techniques include generating the active and 
passive references, finding the latest reference sets, saving the 
latest reference information, and constructing the functional expressions. 



4.4.1 The Data Graph Data Structure 

The data structure which contains the data flow information must 
have three characteristics. First, the data flow information should 
be incorporated into the control flow structure so that the latest 
reference searches can be easily performed. Second, the structure should 
permit the Active and Passive Reference List entries to be associated 
with the block in which they occur in order to facilitate the latest 
reference searches. Third, the structure should retain the latest 
reference information in such a way as to provide for passing the func- 
tional expressions generated by each active reference on to those passive 
references which will need the expressions . 

The first two desired characteristics can be accomplished by 
enlarging the Topology Table entry for each block to include "catalogue 
cards" for data flow information. Figure 4.17 shows the enlarged 
Topology Table block entry. The construction and interpretation of the 
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Figure A. 17 - The Enlarged Topology Table 
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new table words are the same as before, i.e. the left side gives the 
table entry count, and the right side points to those entries in the 
given table. The third characteristic can be fulfilled by properly 
constructing the four new tables, i.e. the Active, Passive, Latest, 
and User Tables. The format and construction of these tables will 
be introduced as they are needed. 



4.4.2 Generating the Active and Passive References 

The purpose of the Reference List entries is to tell the later 
analysis phases what and where information is changed, used or needed. 
The Data Generation and the Data Processing Phases construct the 
Individual active and passive reference entries. The Data Reduction 
Phase uses the active references to find the latest reference sets for 
each passive reference. The Functional Generation Phase uses the latest 
references to construct the functional expressions for inside the 
flowchart boxes. If this processing chain is to be successful, the 
initial Reference Lists must be properly constructed. 

During the Data Gathering Phase, entries are added to the 
temporary Reference Lists whenever an instruction which changes or uses 
Information is found. If the data structure of these lists is to 
conform with the general solution philosophy discussed earlier, the 
structure must permit Individual entries to be added as required but 
yet allow all entries to be processed as a group. 
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These characteristics can be incorporated into two lists, the 
Active Reference List and the Passive Reference List. The Active 
Reference List contains the active reference entries, and the Passive 
Reference List contains the passive reference entries. The format of 
the list entries is shown in Figure 4.18. The "f" portion of each 
entry retains information about the function or purpose of the instruction 
which generated the reference entry, e.g. remembers that the instruction 
was a plain STA instruction which changes only the address portion of 
the "Cell Changed"; was a CLA instruction with a symbolic operand address 
of **; or was a tagged STO instruction. The "Cell Changed" portion of 
an active reference entry is the cell number of the cell changed by the 
active reference. (For bookkeeping purposes, the central processor 
registers are also assigned cell numbers.) The "Cell Used" portion of 
a passive reference entry is the cell number of the cell used by the 
passive reference. The "Instruction Cell" portion of both entry types 
is the cell number of the cell which contains the instruction which 
generated the entries. 

Generating the Active and Passive Reference List entries involves 
detecting all referencing instructions and determining their outcome 
sets. The outcome of an independent reference instruction can be 
determined from the instruction itself. Figures 4.19 and 4.20 show 
examples of list entries generated by independent instructions during the 
Data Gathering Phase. The number of entries made for each instruction 
is a function of its operation code. As in the case of control 
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Figure 4.18 - The Active and Passive Reference List Formats 
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instructions, there is a small but important percentage of referencing 
instructions which are dependent and whose outcome sets cannot be determined 
by the Data Gathering Phase. Now, three such dependent instructions are 
discussed to indicate how their Active and Passive Reference List entries 
are generated. This discussion shows how special Reference List entries 
are used to initiate special procedures to handle dependent instructions. 

The first example of a dependent reference instruction is the 
changed address instruction. Figure 4,21a shows how a changed address 
instruction might occur in a program. Since the Data Generation Phase 
has no way of knowing in advance that the instruction at location B is 
modified, the Data Generation Phase generates the normal Reference List 
entries for that instruction as shown in Figures 4.21b and 4,21c, The 
latter figure shows a passive reference with an unknown "Cell Used" 
portion because of the double asterisk in the instruction at location B, 
During the Data Processing Phase, the active reference to the instruction 
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Figure 4.19 - The Reference List Entries of the CLA Instruction 
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Figure 4. 20 - The Reference List Entries of the ORA Instruction 
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Figure 4,21 - The Reference List Entries for a Changed Address Instruction 
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Figure 4.22 - The Reference List Entries for an Indirectly Addressed Instruction 



CLA-'- A 



f, AC, B 



I 

f. A, B 

! 



a . The Program 



b. The Active 

Reference List 



c. The Passive 
Reference List 



67 



at location B by the STA instruction is detected. Thus, the analysis 
program must find the functional expression for the "Cell Used" by 
location B before it can find the functional expression for the information 
processing performed by that instruction. This order of functional 
determination can be initiated by setting a special flag in the "f" 
portion of the passive reference entries for the modified or changed 
instruction. Therefore, the latest reference searching procedure can 
detect the changed instruction flag and can initiate the proper search 
procedure. 

The second example of a dependent reference instruction is the 
indirectly addressed instruction. Figure A. 22a shows how an indirectly 
addressed instruction might occur in a program. Because the indirect 
address asterisk can be detected while the instruction line is being 
decoded, the Data Generation Phase knows it has an indirectly addressed 
instruction and can generate the proper Reference List entries. For 
such an Instruction, the analysis program must first find the functional 
expression for the address portion of the cell specified by the operand 
address of the instruction before it can determine the functional 
expression for the information processing performed by the instruction. 
In Figure 4.22a at location B, the address portion of cell A must be 
found before the contents of the AC can be determined. This order of 
functional generation can be initiated by constructing a passive reference 
entry whose "f" portion indicates an indirect instruction. Therefore, 
the latest reference searching procedure can detect the indirect flag 
and initiate the proper search procedure. 
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Figure 4.23 - The Reference List Entries for a Tagged Instruction 
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The third example of a dependent reference instruction is the tagged 
instruction. Figure 4.23a shows how a tagged address instruction might 
occur in a program. Again, because the presence of a tag can be detected 
while the instruction line is being decoded, the Data Generation Phase 
knows it has a tagged instruction and can generate the proper Reference 
List entries. For such an instruction, the analysis program must first 
find the functional expression for the index register specified by the 
tag before it can determine the functional expression for the information 
processing of the tagged instruction. In Figure 4.23a at location B, the 
contents of the index register must be found before the contents of the 
AC can be determined. This order of functional generation can be initiated 
by constructing a passive reference entry whose "f" portion indicates a 
tagged instruction and an index number. Therefore, the latest reference 
searching procedure can detect the tag flag and initiate the proper 
search procedure. 
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4.4.3 Finding the Latest Reference Sets 

In the previous section the motivation and technique for constructing 
the necessary Reference List entries were described. The function of 
these entries is to insure that the analysis program can decide the 
sequence in which it needs to determine the information processing of 
the program. The purpose of this section is to explain how the analysis 
program decides which latest reference searches are required and how 
the program performs those searches. 

After all the Reference List entries have been made by the Data 
Gathering and Data Processing Phases, the Data Reduction Phase associates 
each Reference List entry with the program block In which the reference 
occurs. First, the Reference Lists are sorted on their "Instruction 
Cell" portion to place them in the same sequence as the Topology Table. 
Second, the Active and Passive Reference Lists are scanned, and their 
entries placed into the Active and Passive Reference Tables. Figure 4.24a 
shows an example program block and its instructions. Figures 4.24b and 
4.24c show the Sorted Active and Sorted Passive Reference Lists for the 
example block. Figure 4.24d shows how the entries of those two lists 
would be placed in the Reference Table. 

The latest reference searching procedure must find all the latest 
references for each Passive Reference Table entry. The search procedure 
should be performed iteratlvely but yet be able to decide the search 
sequence and handle any program topology, such as loops or parallel paths. 
The search sequence for each passive reference is dictated by the special 
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Figure 4.24 - Topology Table with Active and Passive Entries 
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flags set in the "f" portion of the passive reference entry. The search 
procedure involves searching back from each passive reference entry on 
all control paths until each path is terminated by a matching active 
reference entry; the initial passive reference entry; or a previously 
searched block. A matching active reference is an active reference 
whose "Cell Changed" portion matches the "Cell Used" portion of the 
initial passive reference. The "f" portion of each passive reference 
entry states which cell bits are used by the passive reference; the "f" 
portion of each active reference entry states which cell bits are 
changed by the active reference. Thus, the latest reference searching 
procedure is capable of detecting partial bit matches and can continue 
searching along a path until all "Cell Used" bits have been matched by 
"Cell Changed" bits. 

If the "f" portion of the passive reference indicates a changed 
address, the latest references for the changed address must first be 
determined. If the first search finds only one latest reference and 
determines that the latest reference stores a constant into the changed 
address, a second search can be performed to find the latest references 
for the cell specified by the previously determined constaht. Figure 4.25a 
shows an example of a first search resulting in a constant. The first 
latest reference search on the changed address instruction at location Y 
indicates its true "Cell Used" is location Z. The second latest reference 
search can be performed as if location Y was a CLA Z instruction. On 
the other hand, if the first changed address search finds one or more 
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variable expressions for the changed address, no accurate second search 
can be performed during the first iteration. Figure 4.25b shows an 
exanq>le of a first search resulting in a variable. The first latest 
reference search on location Y indicates that the address portion of Y 
comes from the address portion of X. However, the address portion of X 
is a variable because of the STA X instruction. Therefore, no second 
search can be performed during the first iteration. Only an approximate 
expression of the form, AC - C(a/X), can be produced as output for 
location Y after the first iteration. In Iverson Notation (7), a/X 
indicates the address portion of location X; and C(x) means "the contents 
of location x". 

Figure 4.25 - Programs with Changed Addresses 
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If the "f" portion of the passive reference indicates an indirect 
address, the address portion of the cell specified by the address of the 
indirect instruction must first be determined. If the first search 
finds only one latest reference to that cell and determines that the 
latest reference stores a constant into that cell, a second search can 
be performed to find the latest references for the cell specified by the 
previously determined constant. Figure 4.26a shows an example of a first 
search resulting in a constant. The first latest reference search on the 
address portion of X indicates that it is a constant, Z. The second 
latest reference search is performed as if location Y was a CIA Z 
instruction. On the other hand, if the first search determines that one 
or more variable expressions are stored into the address portion of the 
location specified by the indirect instruction, no accurate second 
search can be performed during the first iteration. Figure 4.26b shows 
an example of a first search resulting in a variable. The first latest 
reference search on location Y indicates that the address portion of X 
comes from the address portion of W. Thus the address portion of X is 
a variable because of the STA X instruction. Therefore, no second search 
can be performed during the first Iteration. Only an approximate 
expression of the form, AC " C(aA*), can be produced as output for 
location Y after the first iteration. 

If the "f" portion of the passive reference indicates a tagged 
reference, latest reference searches are performed on both the "Cell 
Used" and the index register of the tagged instruction. The first search 
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Figure 4.26 - Programs with Indirect Addresses 
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determines the latest references for the "Cell Used", I.e. which Instructions 
could have last made entries Into che table headed by the "Cell Used" 
location. The second search determines the latest references for the 
index .register used by the tagged Instruction, i.e. which instructions 
last modified the index register. Two searches are performed because 
there is little chance that the index register is a constant and that 
the exact "Cell Used" can be determined by the first Iteration. 

Finally, if the "f" portion of the passive reference entry does 
not contain any special latest reference search flags, the latest 
reference search is performed directly on the "Cell Used" portion of 
the passive reference. 
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4.4.4 SavliiR the Latest Reference Information 

After the latest reference set of a passive reference has been 
determined, the latest reference information must be saved in a data 
structure which permits the generation of function expressions for each 
instruction and the transmission of those expressions to other instructions. 
The purpose of this section is to discuss temporary list structures for 
latest reference information and the final Latest Reference Tables 
which fulfill the above requirements. 

If the data structure of the temporary Latest Reference Lists is to 
conform with the general solution philosophy discussed earlier, the 
structure must permit individual entries to be added as required but 
yet allow all entries to be processed as a group. These characteristics 
can be incorporated into two lists, the Latest Reference List and the 
User List. The Latest Reference List contains latest reference entries 
which remember the locations of all latest references for each passive 
reference. The User Reference List contains user entries which remember 
the locations of the passive references which will require the functional 
expressions produced by each active reference. The format of the list 
entries is shown in Figure 4.27, 

The Latest Reference List entries are divided into three parts. The 
first portion is the "Latest Reference Cell" and is the "Instruction 
Cell" of the Active Reference Table entjry which produced the match during 
the latest reference search. The second portion is the "Cell Used" and 
is the same "Cell Used" as in the passive entry which initiated the latest 
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Figure 4.27 - The Latest Reference and User List Formats 









LATEST REFERENCE 
CELL 


CKTJ, USED 


PASSIVE 
INSTRUCTION CELL 









a. The Latest Reference List 









LATEST REFERENCE 
POINTER 


CRT.T, CHANGED 


ACTIVE 
INSTRUCTION CELL 









b. The User List 



reference search. The third portion is the "Instruction Cell" of the 
passive entry and is used to identify which Latest Reference List entries 
are associated with each Passive Reference Table entry. 

The User List entries are also divided into three parts. The first 
portion is the "Latest Reference Pointer" and points to the location 
which contains its Latest Reference mate. The second portion is 
the "Cell Changed" of the Active Reference Table entry which produced 
the match. The third portion is the "Instruction Cell" of that Active 



77 



Reference Table entry and is used to identify which User List entries 
are associated with each Active Reference Table entry. 

As each latest reference search match is found, one entry is added 
to the Latest Reference List and the User List, The absence of Latest 
Reference List entries for a passive reference indicates no latest 
references were found. Figure 4.28 shows the Latest Reference List 
and User List entries that would result for a program where a functional 
expression is needed by two instructions elsewhere in the program. The 
functional expression generated by location 11 is needed at locations 20 
and 30. At location 20 there is a passive reference to location B 
which has one latest reference at location 11. Thus, a single latest 
reference entry is added to the Latest Reference List showing the latest 
reference information, and one user entry is added to the User List. 
Likewise, at location 30 there is a passive reference to location B 
which has one latest reference at location 11. 

The temporary lists are transformed into the final Latest Reference 
Table and the User Table by associating each list entry with the program 
block in which it occurs. The Latest Reference List is sorted on its 
"Passive Instruction Cell" portion while the User List is sorted on its 
"Active Instruction Cell" portion. The resulting list entries are 
associated with the blocks in which they occur by scanning the ordered 
lists and constructing the "Latest" and "User" entries in the Topology 
Table. Figure 4.29 shows the resulting tables for the example shown 
in Figure 4.28. 
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Figure 4,28 - A Program where Symbolic Results are needed at Two Later Points 
in the Program 




f, AC, 10 
f. B, 11 



f. A, 10 

f, AC, 11 



10, AC, 11-^ •, AC, 10 



f, AC, 20 f, B, 20 11, B, 20 -< #, B, 11 



f, AC, 30 f, B, 30 11, B, 30-< », B, 11 



a. The Program 



b. Active 
Table 



c. Passive d. Latest e. User 
Table List List 



79 



Figure 4.29 - The Final Data Flow Tables 
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User Table 
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In summary, each Latest Reference Table entry points from a passive 
reference back to an active reference which is a member of the passive 
reference's latest reference set. Each User Table entry points from 
an active reference forward to a passive reference which will need the 
functional expression generated by the active reference. 



A. 4. 5 Constructing the Functional Expressions 

A functional expression is generated for each active reference entry. 
The instruction operation code retained in the "f" portion of the active 
reference entry dictates its functional expression format. As the 
construction of a new expression begins, the expression format is found 
by extracting the instruction operation code from the active reference 
"f" portion and using the code as a table lookup pointer for the Format 
Table. The Format Table entry for each instruction indicates the 
functional expression format for each of the active references of the 
instruction. The table entry includes the number of entries to be 
expected in the Active and Passive Reference Tables and the operator 
symbols to be used in constructing the functional expressions. Whenever 
possible, a latest reference expression already generated for a previous 
active reference is substituted for each passive reference in the new 
active reference expression. Now, functional expression construction 
is discussed in detail using the program in Figures 4.28 and 4.29 as 
an example. First, the discussion will explain how the functional 
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expressions, AC ** A and B >= A, are constructed for locations 10 and 11. 
Second, the discussion will outline how the expression, B ■ A, Is 
transmitted from location 11 to locations 20 and 30. 

The Active Reference Table entry for location 10 In Figure 4.29 Is 
"f,AC,10" where "f" Indicates a CIA Instruction. The Format Table entry 
for a CLA Instruction Indicates an expression format of: 

"CELL CHANGED" » "LATEST EXPRESSION" (or "CELL USED" if no latest expression) 

The Passive Reference Table entry for the CLA Instruction Is found by 
finding a matching "Instruction Cell" value of 10. In this case the 
passive entry is "f ,A,10". The Latest Reference Table entries for this 
passive reference are found by matching the two right-hand portions of 
each entry. In this case, there are no latest reference entries for 
location 10. Thus, the functional expression for location 10 is AC » A. 

The new functional expression is held for final output processing 
by adding it to the functional Output List. Figure 4.30 outlines the 
data structure of the Output List. The final output processing will need 
to sequence the functional expression strings according to Instruction 
location. To facilitate this resequenclng, a message pointer is constructed 
and added to the Message Pointer List. The left half of each Message 
Pointer List entry indicates the instruction location to which its 
expression applies, and the right side points to the expression itself. 
Thus, the Output List expressions are ordered by sorting the single 
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Figure 4.30 - The Functional Expressions on the Output List 
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10 
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IB 
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a. Message Pointer List 



b. Output List 



entry Message Pointer List instead of the variable length entry Output 
List. 

Once the functional expression is constructed, the User Table must 
be checked to see if any instructions further on in the program need this 
expression. The user entry for the active reference entry is found by 
matching the two right-hand portions of each entry. In this case 
there is one user entry, "Pointer, AC, 10". This user entry states 
that the latest reference entry at the end of the pointer wants to know 
the just derived expression for the AC. The analysis program follows 
the pointer to its Latest Reference Table entry mate, "10,AC,11". Once 
the entry is found, its "Latest Reference Cell" portion is replaced by 
a pointer to the just constructed functional expression on the Output 
List. Also, the latest reference entry is flagged as having an expression 
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pointer. Now, the latest reference entry at location 11 knows where the 
functional expression for the AC can be found, i.e. the functional 
expression has been transmitted from location 10 to location 11. 

The active entry for location 11 in Figure 4,29 is "f ,B,11" where 
"f" indicates a STO instruction. The Format Table entry for the STO 
instruction indicates an expression format of: 

"CELL CHANGED" = "LATEST EXPRESSION" (or "CELL USED" if no latest expression) 

The Passive Reference Table entry for the STO instruction is "f ,AC,11". 
One matching latest reference entry, "Expression Pointer , AC, 11" is found. 
This is the latest reference entry that was found by following the pointer 
of the previous user entry. The functional expression for location 11 
is constructed by first adding the "Cell Changed" to the Output List. 
In this case, the "Cell Changed" is B. Next, the symbolic equal sign 
is added to the Output List. Finally, the expression pointer of the 
latest reference entry is followed to its functional expression, AC = A. 
The expression is scanned until the equal sign is found, and the 
remaining entries after the equal sign are copied onto the Output List. 
Thus, the expression, B = A, is generated for location 11. 

Finally, two identical user entries (Pointer ,B, 11 and Pointer, B, 11) 
are found for location 11 in Figure 4.29. Each of the entry pointers 
is followed to its latest reference mate. Each "Latest Reference Cell" 
portion is replaced by a pointer to the just derived functional expression, 
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B = A, on the Output List; and the latest reference entry is flagged 
as having an expression pointer. Thus, when locations 20 and 30 are 
reached, the functional expression for location B is available. 



85 



ijDf -yes,* ,»(.-.. rf,^ ■ *-f ,^»«,g6if-)«f-. 



CHAPTER 5 



AUTOMATIC PROGRAM ANALYSIS EXAMPLES 



In the previous chapter the approximation procedures used by the 
first iteration to bootstrap itself through the control flow - data 
flow interaction cycle were shown. This outline described the data 
acquisition and data processing sequence and showed the use of inter- 
mediate data flow analysis results to improve control flow approximations 
and vice versa. In addition, a detailed presentation described how the 
control and data flow steps handled the dependent instructions. 

This chapter displays the results of applying the existing automatic 
analysis system to example programs. First, the layout and symbols of 
the output flowcharts are explained. Second, flowcharts of programs 
containing dependent instructions are described. Third, flowcharts of 
programs containing other analysis problems are presented. All output 
examples were automatically produced on-line by an IBM 1052 printer 
keyboard connected to the Project MAC IBM 7094 time-shared computer (2). 
Because the IBM 1052 printer does not normally contain the complete 
Iverson Notation character set (7), character substitutions have been 
made. 
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5.1 THE FLOWCHART FOBMATS 

The analysis program should display its results in a form suitable 
for human use. Because the flowchart has become a standard vehicle for 
program documentation, it is also used here. Currently, the analysis 
system has two levels of flowchart detail : the Topological Flowchart 
and the Detailed Flowchart. The Topological Flowchart presents the 
control flow of a program by displaying its block execution sequence. 
The Detailed Flowchart exhibits both control and data flows by displaying 
the block execution sequence, the functional expressions, and 
pertinent cross reference information. One exan^le of each flowchart 
type is discussed in detail so that only the highlights of later 
examples need to be explained. 



5.1.1 The Topological Flowchart 

Figure 5.1 exhibits an example of a Topological Flowchart. The 
program always starts at Block 1. The asterisks represent the Instruc- 
tions contained within a block. The nuniber at the upper left of each 
block is its Block Number. The dots represent control flow paths. 
The block inputs always enter at the top of the block; the outputs 
always exit at the bottom. No attempt has been made to minimize line 
crossings by rearranging blocks. Now, the interpretation of the flowchart 
symbols of Figure 5.1 is given. 

Block 1 is the starting block and exits to either Block 2 or Block A. 
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Figure 5.1 - A Topological Flowchart 
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Block 2 can be reached from Block 1 and has an "E4" exit. The "E" 

designates an exit to an external subroutine; the "4" indicates 
that the external subroutine returns control to Block 4, 

Block 3 is unreachable and has no exits. Because it follows an 
external subroutine exit, it is probably a subroutine 
calling sequence. 

Block 4 can be reached from Block 1 and "E2". The "E" signifies 

an entry from an external subroutine; the "2" denotes that 
the external subroutine is called by Block 2, Block 4 has 
an "15" exit. The "I" specifies an exit to an internal 
subroutine; the "5" reveals that the internal subroutine 
returns control to Block 5, 

Block 5 can be reached by an "14" entry which denotes a return 
from an internal subroutine called at Block 4. Block 5 
has a "NR" exit which indicates a non-returning external 
subroutine call. 

Block 6 can be reached by an "IE4" entry where the "IE" designates 
an internal subroutine entry and the "4" reveals that the 
subroutine is called by Block 4. Block 6 exits to Block 7. 

Block 7 can be reached from itself or Block 6 and exits to Block 10 
or to itself. 

Block 10 can be reached from Block 7 and exits via an "IR". The 
"IR" specifies an internal subroutine return, such as a 
TRA 1,4. 

Block 11 appears to be a data and storage area. 
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5.1.2 The Detailed Flowchart 

Figure 5.2 shows an example of a Detailed Flowchart. The first 
three lines and the last line on the flowchart page were produced by 
the time-sharing monitor as it prepared the final analysis phase for 
execution. The left side of the output exhibits the original symbolic 
source instructions and their assigned core locations; the right side 
displays the flowchart box outlines, interconnections, and functional 
expressions. The Block Numbers are shown above each block. The 
starting and ending core locations of each block are shown on the left 
side of the block. The block inputs always enter at the top or upper 
right of the block; the outputs always exit at the bottom or lower right. 
The numbers to the right of the entering or exiting dots are Block Numbers 
to which or from which control is transferred. The expressions inside 
the flowchart boxes are the functional expressions. The expressions 
outside the boxes are cross reference expressions preceded by the 
location nximber of the instruction which generated the expression. Now, 
the flowchart symbols of Figure 5.2 are explained. 

Block 1 is the starting block and exits to either Block 2 or Block 3. 
The first instruction of Block 1 is at location 1; the last 
is at location 3. At location 1 the contents of location V 
are placed into the accumulator. At location 2 the contents 
of location V are moved to location W. The cross reference 
expression at the right of location 2 states that the AC was 
changed to the contents of V at location 1. The line for 
location 3 is blank because of unprogrammed subroutines. (See 
Appendix 1 for missing subroutine information.) If the 
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Figure 5.2 - A Detailed Flowchart 



r runs 000000 
VI 1907.0 
EXECUTION. 



01 CLA V 

2 STO \^ 

03 TZF Al 



1 
************************ 

1 * 

* AC=V * 

* v;=v * 

* * 

3 * *) 

************************ 



1 Ar=v 

3 



0t» 
05 



CLA 
STO 



************************ 
ti * * 

* AC=1 * 

* Y = l * 
5 * * 

************************ 



k AC = 1 



05 Al CLA W 
07 STA Z 
10 TSX $EXIT,!< 



************************ 
6 * *(...! 

* AC=V * 2 W=V 

* A/Z=A/V * 6 AC=V 

* * 

10 * *)... M^ 

************************ 



11 V BSS 1 

12 V/ BSS 1 
15 X OCT 1 
lit Y BSS 1 
15 Z BSS 1 



************************ 

11 * * 

* * 

* * 

* * 

* * 

* * 

15 * * 

************************ 



R 14.750 + 3.000 
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progranuning was complete, the line would show V:0; and the 
cross reference expression would state that the AC was V at 
location 1. 

Block 2 can be reached from Block 1 and exits directly to Block 3. 
The first instruction of Block 2 is at location 4; the last 
is at location 5. At location 4 the contents of X are placed 
into the AC. Since the contents of X are constant, the 
symbol X is replaced by its constant value, 1, in the func- 
tional expression for location A, At location 5 the constant, 
1, is stored into location Y. 

Block 3 can be reached from either Block 1 or Block 3 and has a 
non-returning exit. The first instruction of Block 3 is 
at location 6; the last is at location 10, At location 6, 
the contents of location W, which are now the contents of 
V, are placed into the AC. The cross reference expression 
states that the contents of V were placed in W at location 2. 
At location 7 the address portion of Z is replaced by the 
address portion of V. 

Block 4 is unreachable. It begins at location 11 and ends at 

location 15, Block 4 is empty because it contains data 
and storage locations. 



5.2 FLOWCHARTS CONTAINING DEPENDENT INSTRUCTIONS 

The purpose of this section is to show examples of automatically 
produced flowcharts for programs containing dependent instructions. The 
example programs have been kept short so as to spotlight the individual 
dependent instructions. Instead of being viewed as programs in themselves. 
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the examples might be thought of as being imbedded in larger programs . 
Since both the Topological and Detailed Flowchart conventions have been 
discussed, only the pertinent results are explained in the following 
examples . 



5.2.1 The Transfer Switch 

Figure 5.3 shows the first example of a program containing a transfer 
switch. At location 3 a passive reference is made to location 5 which 
contains a transfer instruction, TRA END. At location 4 the transfer 
switch is stored into location 3, i.e. Al. Thus, Block 1 is terminated 
by a transfer switch, and control paths are generated from Block 1 to 
Blocks 2 and 4. Note, the analysis program found that the transfer 
Instruction at location 5 can be executed in that location. Therefore, 
Block 2 is terminated at location 5. 

Figure 5.4 shOT-7S a second example of a program containing a transfer 
switch. The analysis program found that the transfer instruction at 
location 10 is not executed in its original core location. Thus, 
location 10 is included in Block 3 as data. 



5.2.2 The Subroutine Call and Return 

Figure 5.5 shows an example of a program containing subroutine calls 
and returns. At location 4 the internal subroutine, "IN", is called. 
The analysis program detects the Internal subroutine entry point at 
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Figure 5.3 - A Transfer Switch Executed in its Original Location 



r runs 000000 
W 11*23.14 
EXECUTION. 



01 CLA 

02 STO 

03 Al CAL 



************************ 
1 * * 

* AC=1 * 

* Y = l * 

* AC=TRA END * 
3 * *) 

************************ 



1 AC = 1 



Ok SLV? Al 

05 A TRA END 



************************ 
k * * 

* A1=TRA END * 

* * 

5 * *) 

************************ 



3 AC=TRA FMD 
It 



06 X OCT 

07 Y BSS 



************************ 
5 * * 



7 * * 

************************ 



10 END TSX $EXIT,i4 



************************ 
10 * *(... 1,2 

* * 

10 * *)... NR 

************************ 



R 5.283+2.833 
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Figure 5,4 - A Transfer Switch Not Executed in its Original Location 

r runS 000000 
W 161U.9 
EXECUTION. 

1 



01 CLA X 

02 STO Y 

03 Al CAL A 



************************ 
1 * * 

* AC=1 * 

* Y = l * 

* AC=TrvA END * 

5 * ♦). 

************************ 



1 AC=1 
I* 



Ok 
05 



SLW 

TRA 



Al 
END 



************************ 
k * * 

* A1=TRA EMD * 

* * 

5 * *), 

************************* 



3 AC=TRA END 
k 



06 X OCT 

07 Y BSS 
10 A TRA 



1 
1 
END 



************************ 
6 * * 



* * 

10 * * 

************************ 



11 END TSX $EXIT,i4 



************************ 

11 * *(... 1,2 

* * 

11 * *)... NP 

************************ 



R tt. 1(83 + 3. 150 
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r runs 000000 
W 1607.0 
EXECUTION. 



Figure 5.5 - Subroutine Calls and Returns 



02 CLA A 

03 STA X 

01* TSX IN,«t 



************************ 

2 * * 

* AC^l * 

* A/X=A/1 * 2 AC«1 

* * 

It * *)... 13 

************************ 



05 X PZE 



************************ 
5 * * 

* * 

5 * * 

************************ 



************************ 
6 * *(...!! 

06 TSX $EXTERN,I» * * 

6 * *)... El* 

************************ 



************************ 
7 * *(... E3 

07 CLA A * AC«1 » 

10 STO B * 8=1 * 7 AC=1 

11 TSX $EXIT,l» * * 

11 * *)... NR 

************************ 



12 IN CLA 

13 STO 
lit TRA 



A 
B 
2,l» 



************************ 
12 * *(... lEl 

* AC=1 * 

* B=l * 12 AC=1 

* * 

Ik * *)... IR 

************************ 



15 A OCT 1 

16 B BSS 1 



************************ 

15 * * 

* * 

* * 

16 * * 
************************ 
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location 12 and the return at location 14. Because the subroutine returns 
via a TRA 2,4, location 5 is assumed to be a single instruction calling 
sequence. External subroutines are called at locations 6 and 11. 



5.2.3 The Calculated Transfer 

Figure 5.6a shows the Detailed Flowchart of a program containing a 
tagged transfer instruction. The analysis program assumes that Blocks 2 
and 3 can be reached from Block. 1, The "L4" and "L5" entries to Block 1 
indicate that they close control loops from Blocks 4 and 5. Likewise, 
the "LI" exits from Blocks 4 and 5 specify loops back to Block 1. 
Figure 5.6b shows the Topology Flowchart for the same program. 



5.2.4 The Changed Address 

Figure 5.7 shows the first example of a program containing a changed 
address instruction. At location 1 the address portion of W, the constant 7, 
is stored into the address portion of location 2. When the instruction 
at location 2 is executed, it is a CLA 7. Therefore, the contents of 
location 7 or Z are placed into the AC at location 2. This address 
change can be traced during the first iteration because a single constant 
was used as the new address for the changed address instruction. 

Figure 5.8 shows the second example of a program containing a changed 
address. The address portion of location Y is used as the new address 
at location 4, In this case location Y appears to be a variable during 
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Figure 5.6a - A Calculated Transfer 
r runs 000000 
W 1730.8 
EXECUTION. 



00 Al LXA A,l 

01 TRA *,1 



************************ 

♦ *{... LI»,L5 

* IXl-A/1 * 

* * 

1 * *)... 3 

************************ 



02 TRA B 



************************ 

2 * * 

* * 

2 ♦ *)... If 

************************ 



03 TRA C 



************************ 

3 * *(... 1 

* * 

3 • *)... 5 

************************ 



Ok B CLA 

05 STO F 

06 TRA Al 



************************ 
l| • *(... 2 

* AC»1 * 

* F«l * u AC-1 

* * 

6 • *).. . LI 

************************ 



07 C CLA E 

10 STO F 

11 TRA Al 



************************ 
7 * *(... 3 

* AC=2 * 

* F»2 * 7 AC-2 

* * 

11 ♦ *)... LI 

************************ 



12 A OCT 1 

13 D OCT 1 
l«f E OCT 2 
15 F BSS 1 



************************ 

12 * * 

* * 

* * 

* * 

* * 

15 * * 

************************ 
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Figure 5.6b - The Topology Flowchart of the Program in Figure 5.6a 



)*( 

* 

* 
*) 



* 

♦ 
(* 



*( 

* 
♦ 
*) 



)* 

* 
* 
*) 



*( 

* 

* 

(* 



99 



Flgvire 5.7 - A Changed Address Using a Single Constant 



r runs 000000 
W 17itl.5 
EXECUTION, 



00 Al 


CLA 


\l 


01 


STA 


X 


02 X 


CLA 


* * 


03 


STO 


Y 


0l4 


TRA 


Al 



* ************************ 

* *( ■ 

* AC=7 * 

* A/X=A/7 * 

* AC-Z * 

* Y = Z * 

* * 
1, * * ) 

************************ 



LI 

Ar = 7 

1 A/X=A/7 

2 Ar = z 

LI 



5 W PZF Z 

06 Y BSS 1 

07 Z OCT 1 



R 3.583 + 3.1*00 



************************ 



5 * 



************************ 
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Figure 5,8 - A Changed Address Using a Single Variable 

r rim5 000000 
W 1801.0 
EXECUTION. 



00 Al 


CLA 


X 


01 


STA 


Y 


02 


CLA 


Y 


03 


STA 


B 


Oi» R 


CLA 


* * 


05 


STO 


C 


06 


TRA 


Al 



07 C BSS 

10 D OCT 

11 X PZE 

12 Y PZE 



************************ 




* 


*(. . 


. LI 


* AC=10 


* 




* A/Y=A/10 


* 


AC=10 


* AC=A/10 


* 


1 A/Y=A/10 


* A/B=A/A/1D 


* 


2 AC=A/10 


AC=C(A/A/10) 


* 


3 A/P=A/A/10 


* C = AC 


♦ 


It AC=r(A/A/10) 



LI 



************************ 



************************ 



12 



************************ 



R li. 200 + 2. 783 
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the first iteration because of the STA. Y instruction at location 1. 
Because the analysis program believes that the new address of location 4 
is also a variable, the functional expression for that location states 
that the AC contains the contents of the location whose address is 10 or 
D. (In Iverson Notation, A/A/10 = A/10.) 

Figure 5.9 shows a third example of a program containing a changed 
address. The instruction at location 5 can have its address changed 
from either location 1 or location 4. The cross reference expressions 
at location 5 show the two possible values for its new address. If 
location 1 changes the address, it becomes location 10 or D. If location 4 
changes the address, it becomes location 12 or E. Because the address 
of location 5 can be changed from two possible locations, its func- 
tional expression states that the contents of an undetermined location 
are placed into the AC. 



5.2.5 The Indirect Address 

Figure 5.10 shows the first example of a program containing an 
indirectly addressed instruction. The analysis program detects that 
the address portion of location A is a constant and that location 
actually is a CLA C instruction. Therefore,, location 1 loads the contents 
of location C into the AC. During the first iteration, the Data Gathering 
Phase had no reason to generate a passive reference to location C, Thus, 
the analysis program does not yet know that location C is the constant, 1. 

Figure 5.11 shows a second example of a program containing an 
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Figure 5.9 - .\ Changed Address Using Two or More Expressions 

r runs 000000 000000 
W 1809.3 
EXECUTION. 



00 Al 


CLA 


X 


01 


STA 


B 


02 


TNZ 


B 



************************ 
* *(... L3 

* AC-10 ♦ 

* A/B-A/10 * AC^IO 

* * 

2 * •)... 3 

************************ 



03 
Ok 



CLA 
STA 



************************ 
3 * * 

* AC-12 * 

* A/B-A/12 * 
k * * 

************************ 



3 AC-12 



05 B CLA ** 



06 
07 



STO 
TRA 



Z 

Al 



************************ 

5 * •(.. . 1 

* AC-C(**) * 1 A/B-A/10 

* * I, A/B-A/12 

* Z«C(»*) ♦ 5 AC-C(**) 

* * 

7 * •)... LI 

************************ 



10 OCT 1 

11 X PZE D 

12 E OCT 2 

13 Y PZE E 
lit Z BSS 1 



************************ 

10 * * 

* * 

* * 

* * 

* * 

* * 
II, * * 

************************ 



R 5.566+3.783 
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Figure 5.10 - An Indirect Address Using a Constant 



r run5 000000 
W 2053.7 
EXECUTION. 



00 Al CLA* A 

01 STO B 

02 TRA Al 



************************ 
* * ( . . . L 1 

* AC=C * 

* B=C * AC=C 

* * 

2 * *).. . LI 

************************ 



05 A PZE C 

4 B BSS 1 

05 C OCT 1 



************************ 

* * 

* * 

* * 

* * 

* * 
************************ 



R 5.933+2.950 
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Figure 5.11 - An Indirect Address Using a Single Variable 



r runs 000000 
W 1816.8 
EXECUTION. 



00 Al 


CLA 


D 


01 


STA 


E 


02 


CLA* 


E 


03 


STO 


B 


01) 


TRA 


Al 



************************ 

* *(. . . LI 

* AC=5 * 

* A/F=A/6 * AC=6 

* AC=A/5* * 1 A/E=A/6 

* B=A/6* * 2 AC=A/6* 
« * 

I* * *)... LI 

************************ 



05 B BSS 1 

06 C OCT 1 

07 D PZE C 
10 E PZE 



R 5.U35+2.550 



************************ 

5 * * 

* * 

* * 



10 



************************ 
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indirectly addressed Instruction. In this case the indirectly addressed 
location, E, is a variable during the first Iteration because of the 
SIA E instruction at location 1, Thus, the functional expression for 
location 2 states that the AC is loaded indirectly from a location vhose 
address is 6 or C. Once again, the analysis program does not yet know 
that location C is a constant. 

Figure 5.12 shows a third example of a program containing an 
Indirectly addressed Instruction. At location 5 the cross reference 
expressions state that the address portion of the indirectly addressed 
location, X, can be either 11 or 13. Because the indirectly addressed 
location can have more than one expression, the functional expression 
states that the AC is loaded indirectly from X. 



5.2,6 The Tagged Address 

Figure 5.13 shows an example of a program containing tagged 
instructions. At location 3 a tagged passive reference is made to location V 
using index register one. This is stated by the functional expression, 
AC = V(l). The cross reference expression at location 3 states that 
index register one was loaded with a constant, 1, at location 2. 



5.3 FLOWCHARTS CONTAINING OTHER ANALYSIS PROBLEMS 

The purpose of this section is to show examples of automatically 
produced flowcharts for programs containing general analysis problems 
which should be handled by any analysis system. 
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Figure 5.17. - An Indirect Address Using Two or More Expressions 
r runs 000000 
W 1717.6 
EXECUTION. 



00 Al CLA 

01 STA 



D 
X 



02 TNZ A2 



• *(... L3 

♦ AC-11 * 

• A/X-A/11 * AC-11 

* * 

2 * *)... 3 



03 CLA 

Ok STA 



F 
X 



************************ 
3 * * 

♦ AC-13 * 

* A/X-A/13 * 
l» • • 

************************ 



3 AC-13 



05 A2 CLA* X 



06 
07 



STO B 
TRA Al 



************************ 

5 • *(... 1 

* AC-X* • 1 A/X-A/11 

* « I| A/X-A/13 

* B-X* * 5 AC-X* 

* * 

7 ♦ *)... LI 

************************ 



10 B BSS 1 

11 C OCT 1 

12 D PZE C 

13 E OCT 2 
ll» F PZE E 
15 X PZE 



************************ 
10 * * 

* * 

* * 

* * 

* * 

* * 

* * 
15 * • 

************************ 
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Figure 5.13 - A Tagged Instruction 



r run5 000000 
IV 1856.0 
EXECUTION, 



00 Al 


CLA 


01 


STA 


02 


LXA 


03 


CLA 


OU 


STO 


05 


LXA 


05 


CLA 


07 


STO 


10 


TRA 



T 
U 
U,l 

v,i 

w 

X,2 
Y,2 
Z 

Al 



11 T 


OCT 


1 


12 U 


BSsS 


1 


13 V 


OCT 


2 


lU W 


BSS 


1 


15 X 


OCT 


3 


16 Y 


OCT 


U 


17 Z 


BSS 


1 



************************ 



*( 



AC = 1 

A/U=A/1 

IX1=A/A/1 

AC=V(1) 

W=V(1) 

IX2=A/3 

AC=Y(2) 

Z=Y(2) 



LI 


1 
2 
3 



10 



*) 



5 
6 

LI 



AC = 1 
A/U=A/1 
IX1=A/A/1 
AC=V(1) 

IX2=A/3 
AC=Y(2) 



11 



************************ 



************************ 

* * 

* * 

* * 



17 



************************ 
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5.3.1 The Program Loop 

Figure 5.14 shows an example of a program containing a loop. A 
passive reference is made to location A at location 2 . The cross 
reference expressions indicate that A has two possible values. The 
first, A = 1, is generated by location 1; the second, A = 2, is generated 
by location 5. Note that the analysis program detects the second 
expression even though location 5 is ahead of and in a loop with loca- 
tion 2. Because of the difficulty in displaying the expression, AC = 1 
or 2, the symbol A is retained in the functional expression for loca- 
tion 2. 



5.3.2 Temporary Storage 

Figure 5.15 shows a program which uses temporary storage. The 
constant value of A is carried through the sequence of loads and stores 
of the AC until location 12, where D = 1. Likewise, the constant value 
of W is carried through loads and stores of the MQ until location 13, 
where Z = 2. Thus, all references to temporary storage are eliminated 
at locations 12 and 13. 



5.3.3 Parallel Latest Reference Search Paths 

Figure 5.16 shows a program which contains two parallel latest 
reference search paths from a passive reference to an active reference. 
At location 5 there is a passive reference to B. The latest reference 
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Figure 5.14 - A Program Loop 



r runS 000000 
W 1825.7 
EXECUTION. 



00 
01 



CLA 
STO 



X 

A 



************************ 

* * 

* AC=1 * 

* A = l * 

1 * * 
************************ 



AC=1 



************************ 



02 Al CLA 

03 STO 
Off CLA 

05 STO 

06 TRA 



B 
Y 
A 
Al 



2 * 

* 

* 
* 
* 
* 
* 
6 * 



AC = A 

B = A 
AC = 2 
A=2 



*( 

* 
* 
* 
* 
* 
* 
*) 



L2 

1 A=l 
5 A=2 

2 Ar = A 

It AC=2 
L2 



************************ 



07 X 


OCT 


1 


10 Y 


OCT 


2 


11 A 


ess 


1 


12 B 


BSS 


1 



************************ 
7 * * 

* * 

* * 

* * 

* * 

12 * * 

************************ 
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Figure 5.15 


- Tl 


r runs 


000000 




W 1832 


.h 




EXECUTION. 




00 Al 


CLA 


A 


01 


LDQ 


W 


02 


STO 


B 


03 


STQ 


X 


Oil 


CLA 


B 


05 


LDQ 


X 


06 


STO 


C 


07 


STQ 


Y 


10 


CLA 


C 


11 


LDQ 


Y 


12 


STO 


D 


13 


STQ 


Z 


Ik 


TRA 


Al 



The Elimination of Temporary Storage References 



15 


A 


OCT 


16 


B 


PZE 


17 


C 


PZE 


20 


D 


PZE 


21 


U 


OCT 


22 


X 


PZE 


23 


Y 


PZE 


2I» 


Z 


PZE 
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1 






it*********************** 






*(... 


LI 


* AC»1 






* MQ-2 






* B-1 




AC-1 


* X'=2 




1 MQ-2 


AC-; 




2 B-1 


* MO- 2 




3 X-2 


* C-1 




k AC-1 


* Y«=2 




5 MO-2 


• AC-1 




6 C-1 


* MQ-2 




7 Y=2 


* D-1 




10 AC-1 


* Z«2 




11 MQ-2 










*)... 


LI 



Ik 

************************ 



2 

************************ 

15 * * 

* * 

* * 

* * 

* * 

* * 

* * 

* * 

* * 

2k * * 

************************ 
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Figure 5.16 - Parallel Latest Reference Search Paths 

r runs 000000 
W 181*0.3 
EXECUTION. 



00 Al CLA 

01 STO 

02 T2E 



A 
B 
A2 



************************ 



* 

* 

* 

* 

2 * 



AC=0 
B»0 



*(... L3 

* 

* AC=0 

* 

*)... 3 



************************ 



03 CLA 

Ok STO 



X 
Y 



************************ 
3 * * 

* AC=1 * 

* Y=l * 
l» * * 

************************ 



3 AC=1 



05 A2 CLA B 

06 STO C 

07 TRA Al 



************************ 
5 * *(...! 

* AC«0 * 1 B»0 

* C-O * 5 AC=0 

* * 

7 * *)... LI 

************************ 



10 A PZE 

11 B BSS 1 

12 C BSS 1 

13 X OCT 1 
Ik Y BSS 1 



************************ 

10 * * 

* * 

* * 

* * 

* * 

* * 
III * * 

************************ 
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search discloses two paths from location 5 to the active reference to 
B at location 1 . The first path is from Block 3 through Block 2 to 
Block 1; the second path is from Block 3 directly to Block 1. The cross 
reference expression at location 5 states that 8=0. Thus, the func- 
tional expression for location 5 is AC = 0. 



5.3.4 Multiple Latest Reference Search Paths 

Figure 5.17 shows a program which contains a passive reference with 
multiple latest references. At location 5 there is a passive reference 
to X. The cross reference expressions show two latest reference values. 
The first is X = 1 generated by location 1 in Block 1; the second is 
X = 2 generated by location 4 in Block 2. Because there are two latest 
expressions for X at location 5, the symbol X is used in the functional 
expression, AC = X. 
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Figure S.17 - Multiple Latest Reference Search Fetha 



r runs 000000 
W 1847. d 
EXECUTION. 



00 Al 


CLA 


A 


01 


STO 


X 


02 


TZE 


A2 



************************ 
* *(... L3 

* AC-1 * 

* X-1 * AC-1 

* * 

2 * •)... 3 

************************ 



05 CLA 

Oil STO 



B 
X 



************************ 
3 * • 

* AC-2 * 

• X«2 * 
k * * 

********* A-************** 



3 AC-2 



05 A2 CLA X 



06 
07 



STO 
TRA 



Y 

Al 



************************ 
5 • *(... 1 

* AC-X * 1 X-1 

* * k X-2 

* Y«X ♦ 5 AC-X 

* * 

7 • •).., LI 

************************ 



10 A 


OCT 


1 


11 B 


OCT 


2 


12 X 


BSS 


1 


13 Y 


BSS 


1 



************************ 
10 * * 

* * 

* . * 

* * 

* * 

13 * * 

************************ 
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CHAPTER 6 



CONCLUSIONS 



In the previous chapters some of the problems and solutions of 
automatic program analysis were discussed. The initial problem that 
the analysis system faced was the cyclic interaction of control flow 
and data flow due to dependent instructions. This cyclic behavior 
suggests an iterative procedure in which current results were used to 
update and improve earlier approximations. The techniques and proce- 
dures of the first iteration were presented, and actual flowcharts of 
programs containing dependent instructions were displayed. 

An analysis system should uncover what a program does and should 
transmit it to the user in a comprehensible form. The purpose of 
this chapter is to discuss the usefulness of the first iteration output 
and to suggest paths that can be followed in the second iteration to 
further improve the utility of these results. 



6.1 THE USEFUnJESS OF THE FIRST ITERATION OUTPUT 

When a programmer begins to layout a program, he has a specific 
job or function he wishes the machine to perform. For example, he 
might wish to write a subroutine which calculates sine x. The programmer 
knows that he must develop an algorithm for calculating sine x and 
then convert his algorithm into machine code. 
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First, the programmer remembers from past experience that there 
is an infinite series expansion for sine x of the form: 

3^5 7 . 
sine X = X - 3C_ + x_ - x_ + . . . . 

31 51 71 

Second, the programmer knows that he must truncate the infinite series 
after the n-th term because his machine has limited speed and accuracy. 
Therefore, he develops an approximate function of the form: 



1^ , ,.± 2i+l 
sxne X = 5^ (-1) X 

^1 (2i+l)I 



Third, the programmer might now decide to transform his truncated series 
into a rational approximation or to reduce the series length by applying 
Chabyshev economization. 

Fourth, the programmer minimizes the number of instructions and 
execution cycles by deriving an expression which can be imbedded in a 
program loop. If the third step was omitted, the expression might be 
of the form: 



SUM. = SUM. , + (-1)^ X SUM. ' 



Fifth, the programmer codes his algorithms using his own personal 
coding conventions and programming tricks. 
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When a program analysis system is applied to the final program it 
should reverse the programming process and uncover what the program does. 
Because there are still unprogrammed subroutines in the functional expres- 
sion generation program of the first iteration, the output flowcharts 
for the above example cannot be shown. If all functional expression 
generation subroutines were available, the first iteration should output 
expressions at the level of the fourth step shown above, i.e. how a 
program does what it does . 

In general, the output results show that it is possible to 
automate the initial stages of analyzing self-modifying programs. Such 
stages involve scanning the input program, detecting connected pieces, 
constructing elementary functional relationships, and pointing out 
trouble areas. The feasibility even at this level is open to question 
because the four analysis phases currently total some 11,000 instructions, 
pseudo-operations, and macros which assemble into nearly 100,000 memory 
locations. The time-shared execution time averages about thirty seconds 
for each of the short example programs shown in Chapter 5. (Because the 
analysis system was developed and debugged on an experimental time-shared 
system, the analysis program organization was dictated by the characteris- 
tics of the time-sharing monitor, not execution time or memory length. 
Thus, times and lengths are somewhat exaggerated.) It is hard to give 
an objective evaluation as to the usefulness of the first iteration 
output because the missing functional generation routines made it impos- 
sible to ask a large sample of programmers to use the output in their 
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debugging or documentation tasks. It Is true that the usefulness of these 
output results would be improved if they were refined by a second 
iteration. 



6.2 THE PROBLEMS OF THE SECOND ITEBATION 

Throughout the first iteration, many approximations were made in 
order to bootstrap through the control flow - data flow interaction 
cycle. The second iteration must check those approximations and update 
them if necessary. The purpose of this section is to point out and 
describe promising areas of further research which should improve 
the results of the first Iteration. 

Probably the first area which should be explored is the utiliza- 
tion of the functional expressions generated at the end of the first 
Iteration. This would involve the development of a functional expression 
simplification and manipulation subroutine similar to the work being 
done with the LISP prograraning language. Such a subroutine would be used 
to remove the superfluous Iverson Notation symbols Introduced by the 
many program procedural and bookkeeping operations, e.g., A/A/1 ■ A/1 =■ 1. 

A second promising area is the utilization of the Input data of 
the program being analyzed. This would require the development of a 
descriptive language which would convey the meaning and scope of the 
input data. Such additional Information could be used to reduce the 
almost limitless possible program outcomes. 
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A third promising area is the development of a second iteration 
which would interact on-line with a human analyzer. The first iteration 
would handle the routine analysis functions and tell the second iteration 
where help was needed. The second iteration would display its current 
results and ask for help. After the human being decided how the situa- 
tion should be handled, the second iteration would use the new directions 
to update its current analysis results. 
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APPENDIX ONE 



FLOWCHARTS OF THE ANALYSIS PPvOGlUM 



The purpose of this appendix is to present the flowcharts of the 
analysis program. The presentation is divided into four parts according 
to the analysis phases as shown in Figure A.l. Because of the size and 
coniplexity of the analysis programs, only execution order and computation 
summary are shown. 
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PHASE ONE 

MAINl is the main program of Phase One as shewn in Figure 4.1. 
MAINl reads the input program one line at a time. Since the FAP assembler 
produces a variable format output tape, MAINl must decide what type of 
information is present on each line. Usually, MAINl will scan through 
the page headings, comments, and blank lines until the Transfer Vector 
is reached. Thereupon, the Transfer Vector entries are copied into the 
Transfer Vector Table. When an instruction is found, control is trans- 
ferred to OPCODE for operation code identification. After OPCODE has 
identified the instruction and picked up its code word, RECODE recedes 
the instruction line into various lists as a function of the code word. 

RECODE scans across the code word bit by bit. If a bit is set or 
on, control is transferred to its particular subroutine. Bit 1 is used 
to find the first executable instruction. Bit 2 is used tC' flag an 
instruction which must be treated as an exception. Bit 3 signifies a 
type 1 transfer, i.e. one which always transfers to the location specified 
by its address, e.g. a TXI instruction. Bit 4 denotes a type 1 transfer 
which can be tagged or indirectly addressed, e.g. a TEA instruction. 
Bit 5 specifies a type 2 transfer, i.e. one which can transfer control 
to either the address location or the next sequential location, e.g. a 
TXH instruction. Bit 6 signifies a type 2 transfer which can be tagged 
or indirectly addressed, e.g. a TZE instruction. Bit 7 shows a type 3 
transfer, i.e. one which can transfer control to either of the next two 



121 



h: . "-.----, ,-•" •-!r'4-'-JX*VViS«S^^8^^i|!^SRi^»j!*1#«W|S^iiiWi«f^<^l^ 



sequential instructions, e.g. a ZET instruction. Bit 8 denotes a type 4 
transfer, i.e. one vhlch can transfer control to any of the next three 
sequential instructions, e.g. a CAS instruction. Bit 9 is reserved for 
the TSX instruction. Bits 10 and 11 are used by the XEC and various 
1/0 instructions. Bits 12 and 13 specify Storage and Data Pseudo Opera- 
tions, such as BSS and OCT. Bits 14 through 19 are reserved for the 
various referencing instructions. The Refer type transmits information 
from one location to another, e.g. a CIA instruction. The Use type uses 
the contents of one location to transform the contents of another loca- 
tion, e.g. the ORA instruction. The Test type tests the contents of 
various locations In order to make a transfer decision, e.g. the TZE 
instruction. The Set type sets the contents of a location to a known 
value, e.g. the STZ Instruction, A Shift instruction shifts the bits 
of some register, e.g. the ALS instruction. An Arithmetic type performs 
numerical operations, e.g. the ADD instruction. Bits 27 to 36 contain 
a compact Short Code used to recode the instruction's operation code. 
The Short Codes are numbered consecutively and lend themselves to table 
lookups . 
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Figure Al.l - MAINl 



Next Line 



No 



MINI 
Get Input Program Name 
Read External TSX File' 

f 

Read Next Input Program Line .^ 

\ 

Is Line a Page Heading? _^____ 

f No 

Yes 

Is Line a Comment? _____»_«»_ 

■I No 

Is Line a "MACRO" Instruction? 

i Yes 
Set Macro Definition Flag , 



Yes 



->- 



Is Macro Definition Flag Set? 
No . " 

i Yes 
Is Line a Macro "END" Instruction? 

I Yes 
Reset Macro Definition Flag— — — — 



No 



-^ Convert Assigned Location from BCD to Binary 

Convert Numerical Instruction from BCD to Binary 



\ 
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I 



No 
Was BCD Operation Code Blank? 



Yes 
No 



Is Inside Transfer Vector Switch On? 

^ Yes 
Was BCD Location Blank? 

\L' Yes 
Reset Inside Transfer Vector Switch ^ Next Line 



J^ Make Transfer Vector Table Entry 5^ Next Line 

No 



-$>■ Is Instruction Line the Transfer Vector Heading? _ 
i Yes 
Set Inside Transfer Vector Switch >.Next Line 



Construct BCD Operation Code - <^ 

t 

If Indirect "*" Found, Set Indirect Flag 
If Address has "**", Set "**" Flag 

I 

Identify BCD Opcode and Pickup Code Word (OPCODE) 

t Not 

Make List Entries (RECODE) "END"— >- Next Line 

I "END" 
Process Internal TSX Returns 

I 

5>. Read Next Line 

t 

Is Line the Last Line Used Statement? 



No , 

Y Yes 

Make Special Exit List Entry for Last J_,ocation 

t 

_»5,. Read Next Line 

t 

-rr— Is It Symbol Heading Line? 
No . 

y Yes 

Construct Symbol Table 

I 

MAIN2 
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Figure A1.2 - OPCODE 



OPCODE 



Find Matching BCD Operation Code Entry 

T 

Pickup Code Word Entry 

T 

Transmit Code Word to RECODE ^ MAINl 



The OPCODE Table Entry: 



BCD Instruction Operation Code 



12 3 4 



27 



36 



Bit 1 - Executable Instruction 

Bit 2 - Exception 

Bit 3 - Type 1 Transfer 

Bit 4 - Type 1 Tag Transfer 

Bit 5 - Type 2 Transfer 

Bit 6 - Type 2 Tag Transfer 

Bit 7 - Type 3 Transfer 

Bit 8 - Type 4 Transfer 

Bit 9 - TSX Transfer 

Bit 10 - XEC Instruction 

Bit 11 - 1/0 Instruction 

Bit 12 - Storage Pseudo Operation 

Bit 13 - Data Pseudo Operation 

Bit 14 - Refer Type Reference 

Bit 15 - Use Type Reference 

Bit 16 - Test Type Reference 

Bit 17 - Set Type Reference 

Bit 18 - Shift Type Reference 

Bit 19 - Arithmetic Type Reference 

Bits 27 to 36 - A Compact Numerical Instruction Code 
used to recode the Operation Code for 
later table lookup identifications 
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Figure A1.3 - RECODE 



RECODE 
No 



4 

Is Executable Instruction Bit Set (Bit 1)? 

i Yes 

Yes 

^ . Is First Executable Instruction Flag Set? 



4> No 
Add Instruction Location to Starting Location List 

Set First Executable Instruction Flag 



No 
->. Is Exception Bit Set (Bit 2)? 



isti 
"I No 



Yes 

Yes 
Does Instruction Short Code Indicate an "END"? — y ■ MAINl 



No 
Does Instruction Short Code Indicate an "ENTRY" -->- MAINl 

'^ Yes 

Add Entry Location to Starting Location List 

Set First Executable Instruction Flag 

I 

Add Starting Location Entry to the Entry Point List ^ MAINl 

Copy Binary Location and Binary Instruction Onto Binary File ^ 

I 

I Ybs 

Is Type 1 Transfer Bit Set (Bit 3)? Si^ Tl 

f No 

Yes 

Is Type 1 Tag Transfer Bit Set (Bit 4)? ^TITAG 

^ No 

Yes 

Is Type 2 Transfer Bit Set (Bit 5)? ^ T2 



No 
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I 

Yft 
Is Type 2 Tag Transfer Bit Set (Bit 6)? > . T2TAG 

y No 

Is Type 3 Transfer Bit Set (Bit 7)?__lll^ T3 

4r No 

Yes 
Is Type 4 Transfer Bit Set (Bit 8)? -1>- T4 

I No 

Is TSX Transfer Bit Set (Bit 9 ) ? »_Jfl^ TSXTRN 

jf Ho 

Yob 

Is XEC Bit Set (Bit 10)? — >■ XEC (Not Progra , 

Y No 

Yes 
Is 1/0 Bit Set (Bit 11)? %. 1/0 (Not Programmed) 

Jr No 



Y No 

Yes 

Is XEC Bit Set (Bit 10)? — > ■ XEC (Not Programmed) 

Y No 

1/0 Bit Set (Bit 11)? »JL2!^.I/0 (Not Pr 

4r No 

Yes 
Storage Bit Set (Bit 12)? — .fly. STORAG 

I No 



Y No 

Yes 
Is Data Bit Set (Bit 13)? y DATGEN 

I No 

Yes 
Is Refer Type Reference Bit Set (Bit 1A)? > REFER 

I No 

Yes 
Is Use Type Reference Bit Set (Bit 15)? 1^1^ USE 

I No 

Yes 
Is Test Type Reference Bit Set (Bit 16)? y. TEST 

I No 

Yes 
Is Set Type Reference Bit Set (Bit 17)? y- g^T 

i No 

Yea 

Is Shift Type Reference Bit Set (Bit 18)? __l2_^_ SHIFT 
I No 

Yes 

Is Arithmetic Type Reference Bit Set (Bit 19)? -^ ARITH 

I No 

MAINl 
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Figure A1.4 - Tl 



Tl 

f 

Increment Entry Point List Counter 
Make Single Entry Point List Entry 
Increment Exit Point List Counter 

t 

Make Single Exit Point List Entry >- RECODE 



Figure A1.5 - T1TA.G 



TITAG 

t 

No 
^ Is the Type 1 Transfer Tagged? 



I Yes 

Set Tagged Flag in "f" 

t 

-£2_ Is Transfer Address "Small Constant"? 



I Yes 
Set Probable Subroutine Return Flag in "f" 

t 

Make TSX Return List Entry 

Is the Type 1 Transfer Indirectly Addressed? 

I Yes 
Set Indirect Flag in "f" 

» No 
^ Are Either Tagged or Indirect Flags Set? ■^>~ Tl 

4r Yes 

Increment Exit Point List Counter 

f 

Make Single Exit Point List Entry Using Flagged "f "->- RECODE 
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Figure A1.6 - T2 



T2 

I 

Increment Entry Point List Counter 
Make Double Entry Point List Entry 
Increment Exit Point List Counter 

t 

Make Double Exit Point List Entry — ^ RECODE 



Figure Al,7 - T2TAG 



T2TAG 
No 



I 

Is the Type 2 Transfer Tagged? 

i Yes 
Set Tagged Flag in "f" 

t 

Is the Type 2 Transfer Indirectly Addressed? 

I Yes 
Set Indirect Flag in "f" 

t 

■>- Are Either Tagged or Indirect Flags Set — -£^ T2 
i Yes 
Increment Exit Point List Counter 

t 

Make Single Exit Point Entry Using Flagged "f" 

t 

Make Single Exit Point Entry With no Flag in "f" 

t 

Increment Entry Point List Counter 

I 

Make Single Entry Point Entry With no Flag in "f" js^ RECODE 



129 



Figure A1.8 - T3 



T3 

I 

Increment Entr}' Point List Counter 
Make T\-7o Entry Point List Entries 

I 

Increment Exit Point List Counter 

I 

Make IXvo Exit Point List Entries J^ RECODE 



Figure A1.9 - T4 
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Figure ALIO - TSXTRN 
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Figure Al.ll - DATGEN 
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Figure Al,12 - STORAG 
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Figure A1.13 - REFER 
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Figure A1.14 - USE 
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134 



Figure A1.15 - TEST 
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Figure Al .16 - SET 
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Figure A1.17 - SHIFT 
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Figure A1.18 - ARXXH 
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PHASE TWO 

MA.IN2 is the main program of Phase Two as shown in Figure 4,1. 
MAIN2 calls seven subroutines which perform the required Data Processing 
functions. Because of programming considerations, the Data Reduction 
function of breaking the program into blocks is performed at the end of 
this phase. SET21 reads the various temporary data files into memory, 
PART finds which portions of each cell are actively referenced. CONSAT 
determines which passive reference entries reference constants and which 
active reference entries reference results, GETCON finds the value of 
each constant cell by scanning the Binary File, SWITCH detects any 
transfer switches and corrects the Entry Point and Exit Point Lists, 
CHANGE identifies and flags all modified instructions. TOPSET breaks 
the ptogram into blocks and constructs the Control Tables, 
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Figure A1.19 - MAIN2 
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Figure A1.20 - SET21 
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Figure Al,21 - PART 
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Figure A1.22 - CONSAT 
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Figure A1.23 - GETCON 
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Figure A1.24 - SWITCH 
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Figure A1.25 - CHANGE 
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Figure A1.26 - TOPSET 
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PHASE THREE 

MAIN3 Is the main program of Phase Three as shown in Figure A.l. 
MMN3 calls five subroutines which perform the required Data Reduction 
functions. SET31 reads the Control Tables into memory and converts the 
To and From Table contents from instruction locations to Block Numbers. 
CONECT checks the block interconnections and makes the required correc- 
tions. LOOP detects all prograin loops and flags both To and From Table 
loop closing branches. SET32 loads the Active and Passive Reference 
Lists into memory and constructs the active and passive entries in the 
Topology Table. LATEST determines the latest reference sets for each 
passive reference and stores the latest reference information in the 
Latest Reference and User Lists. 
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Figure A1.27 - MAIN3 
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Fisure A1.28 - SET31 



SET31 



I 



Load Topology Table into Memory 

\ 

Load To Table into Memory 

i 

Sort To Table into Sequential Order 

I 

Arc There More To Tabic Entries? 

Y Yes 
Get Next To Table Entry 

I 

-Get Next Topology Entry "START" Portion 



No 



No 



I 



Does To Table Entry Equal "START"? 

Y Yes 
Replace To Table Entry by Block Number of Topology Entry 



Resort To Table into Original Order .i^- 
Load From Table into Memory 



Sort From. Table into Sequential Order 



t 



149 



I 



No 
.^^ Are There More From Table Entries? 

^ Yes 

Get Next From Table Entry 

^ Get Next Topology Entry "END" Portion 

I 

^° Does From Table Entry Equal "END"? 



i Yes 

Replace From Table Entry by Block Number of Topology Entry 



Resort From Table into Original Order ^— 

Load Starting Location List into Memory 

4 

Sort Starting Location List into Sequential Order 

i 

o No 
_^ Are There More Starting Location List Entries? 

Y Yes 

Get Next Starting Location List Entry 

t 
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Figure kl.29 - CONECT 
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^ Are There More Reachable List Entries? 
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Figure Al,30 - LOOP 
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Figure A1.31 - SET32 
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Figure A1.32 - LATEST 
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PHASE FOUR 

For programming purposes. Phase Four is divided into two parts 
MAIN4 is the main program of the first part of Phase Four. MAIN4 
calls two subroutines which generate the functional expressions. SET41 
reads the various lists and tables into memory and constructs the Latest 
and User entries in the Topology Table. PERT first generates a reachable 
block list and then constructs a functional expression for each active 
reference. WA1N5 is the main program of the second part of Phase Four. 
MAIN5 calls two subroutines V7hich produce the detailed output flowchart. 
SET51 reads the various lists into memory and sorts the Message Pointer 
List into sequential order. OUTPUT uses the Topology Table and the 
ordered Message Pointer List to produce the output flowchart. 
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Figure A1.33 - MIN4 
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Figure A1.34 - SET41 
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REFER4 
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Figure A1.35 - M<\IN5 
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Figure A1.37 - SET51 
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Figure A1.38 - OUTPUT 
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"y - Print BCD Source Instruction 
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\ 

Print Ending Location Line 

Print Bottom Asterisk Line 



No 
Is Connect Switch Set? 



Y Yes 

Print Block Connecting Lines 

I 

Reset Connect Switch ^, . More Blocks 
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APPENDIX TTTO 
FLOWCHARTS OF ACTUAL PROGF^AMS 

One standard question has been, "Can the Analysis Program analyze 
itself?" The purpose of this appendix is to display flowcharts of 
analysis subroutines produced automatically by the analysis program. 
In general, the Topological Flowcharts are accurate, v;hile the Detailed 
Flovjcharts are incomplete due to unprogrammcd functional generation 
subi'outincs . 
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Figure A2.1a shov.'s the listing of a subroutine v;hich converts 
the binary nunibo.r cantnined in the logical AC into a BCD number v^ith 
leading blanks. Figure A2.1b displays the Topological Flowchart of 
tlie conversion program v;hile Figure A2.1c displays the Detailed Flowchart. 
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Figure A2.1a - A Binary to BCD Conversion Subroutine 
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Figure A2.1b - The Topological Flov.'chart for Figure A2.1a 
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Figure A2.1c - The Detailed Flowchart for Figure A2.1a 
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Fi[','arc A2 .2a sliows Llie listing of a pjrograui v.'liich f.orts a tcTble oa 
its address portion and th-cn \'.iithin the same address by tag. First, 
the program 3 ntcrcba ng,es the address and tag portions of caeh en.try. 
Second, the program calls a binary sort routine, DSliTlS, to perform the 
sort. Third, tlie program returns the addresses and tags to their 
original positions. Fi.gure A2.2b displays the Topological Flowchart 
of thc! prograi";. 
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Figure A2,2a - An Address and Tag Sort Subroutine 
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Figure A2.2b - The Topological Flowchart for Figure A2.2a 
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Figure A2.3a sliows the listing of the DSRT18 program which perfornis 
a binary sort using only the right-hand eighteen bits. Figure A2.3b 
displays the Topological Flov.'chart of the sort program. KTiile the 
DSRT18 subroutine was being analyzed, the CONECT subroutine found that 
there was an instruction just above location S0RT31 in DSRT18 v;hich 
could not be reached. This unreachable instruction turned out to be 
extraneous and must have been inserted v;hile the program was being 
prepared for input to the computer. After the extra instruction was 
removed, the analysis program ran to completion. 



179 



Figure A2.3a - 


A Binary Sort J 


* 
* 


ENTRY 


nsRTis 


* 
* 
* 
nSRTlS 


SXA 


1X1,1 




SXA 


1X2,2 




SXA 


SRTH,l* 




CLA* 


1,1* 




SUB 


A02 




PAC 


,1 




TXI 


*+l,l,2 




SXA 


SORTA,! 




CLS* 


2,k 




TZF 


SRTM 




ADM* 


1,1. 




SI!R 


A02 




PAC 


,2 




SXA 


snRTn,2 




AXT 


18,1 


SORTl 


LXA 


SORTB,lt 




CLA 


SORTA+18, 




PAX 


,7- 




CAI. 


SORTBT+IR 




TXI 


SORTll, 2, 


S0RT12 


sxn 


* + l,ii 




TXH 


SORT13,2, 




LDI 


2,2- 




TIF 


* + 2 




TXI 


*-3,2,2 




LPQ 


n."* 




STI 


0,U 




STQ 


2,2 




l.nQ 


1,1^ 




LPI 


3,2 




STQ 


3,2 




STI 


1,'. 


SORTll 


sxn 


* + l,2 




TXL 


?0RT13,t(, 




Lni 


?.,h 




TIO 


**?. 




TXI 


*-3,i4,-2 




TXI 


*+l,U,-2 




TXI 


SORT12,2, 


SORTl 3 


TXl 


*+l,2,2 




PXA 


0,2 


S0RT3 


STA 


SORTA+19, 



SURPOUTIME TO SORT A LIST OF NLtMRFPS 
THF SMALLEST rJUMBFR IS AT LOW OCTAL 
THE CALLING SFQUFNCF. 

TSX <;S0RT1P.,U 

PZF I.OC OF MICH O^T^L + 1 AnnRFSS 

P7F COUNT AnORFSS 
SAVF IXl 
SAVF 1X2 

SAVF THF RFTIJRM APDRFSS 
OFT HIGH OCTAL +1 AOPRFSS 
FORM HinH OCTAL AnpRPSS 
FORM 2'S COMP (HICH OCTAL) 
FORM 2'S COMP (H|CH OCTAI) + 1 
STORE IN SORTA 
GET -COUNT 

IF COUHT ZERO, RFTURM 
FORM HIGH OCTAL +1 -COUNT 
FORM HIGH OCTAL - COUNT 

fonr 2'S COMP (high octal-count ) 

STORE IN SORTP- 

INITIALIZE SEPARATION ROUTINFSET IXl!=TOP OF 
1 

SFT IX2=nOTTON OF STBL 
, 1 PICKUP BIT TO RF SORTED ON 

-2 

SCAN UP FOR ZPRO-PIT 
** STOP AT TOP OR ON I AST SORTFO PYNPO-I 

TRA ON FIRST ZERO PIT 



SCAN no^'E FOP ONE-BIT 
** STOP AT BOTTOM OR ON LAST SORTFD SY( 



TPA ON FIRST SYMBOL V'/ PROPER BIT 



SAVF LOC, OF LAST SORTFP SYMBOL 



180 



ffm-Hf'^i'^-ii-^'iJ'f'-. 



■»jM->M i n| i imi i ^uMj.[y.ij^^i [ iytiL ■L-i'i ' a ' aj'jat, », i-j B H^'Miiy'"'^ ?- "^''' 





CAS 




TRA 




TRA 


SORT32 


TIX 


SORT87 


A XT 




CLA 




SUB 




STA 




CLA 




STA 




CAS 




STA 




TRA 




TRA 


S0RT31 


CLA 




TRA 


SORTtf 


SUB 




STA 




LOQ 




STQ 




TLQ 


SORT? 


TXH 




LPQ 




STQ 




TLQ 




TXI 




SUB 




STA 




TLO 




TXI 


SRTN 


AXT 


1X1 


AXT 


1X2 


AXT 




TRA 


SORT77 


P7E 


SORTB 


P7.F 


SORTA 


DUP 




PZE 


^C[?. 


PZR 


SORTBT 


OCT 




OCT 




OCT 




END 



SORTR 

SnRT31 

SORTU 

SORTl, 

1,1 

SORTA+ 

A0.2 

SORTB 

SORTA+ 

SORTA+ 

SORTB 

SORTB 

S0RT7 

SORT87 

SORTA+ 

S0RT3 

AQ2 

SORTB 

SORTA+ 

SnRTA+ 

SORT32 

SRTM,1 

SORTA+ 

SORTA+ 

**2 

S0RT7, 

A02 

SORTB 

SORTl 

S0RT7, 

** ,h 

**, 1 

**,?. 



1,19 



CAS W/ TOP OR W/ PREVIOUS LOC, 
IF OPHATFR PICK UP EARLIER LOC. 
IF EOUAL, SFE IF HOWE 
1,1 IF LFSS, GO ON TO NEXT BIT 



18 



17 

18 



18,1 IF R2 IS ABOVE (SORTB), 

SET IT TO BOTTOM OF BLOCK 

BR I NO SORTB UP TO DATE 
18,1 LOOK AT LAST LOC. EXCHAMnFO 

19,1 MOVE IT UP 

CAS I'V SORTB 

17,1 SEARCH PACK UP THROUOH THE TABLE 

18,1 

CAS TO SORTR APAIM 
1,1 00 AROUNO ARAIN 

UPDATE SORTB A^AIN 
00 BACK FOR ANOTHER PASS 
1,1 INCREMENT 1X1 TO SEARCH TABLF 

RESTORE 1X1 
RESTORE 1X2 



uoooon,2nonnn,inonoo 

1(0000, 20000, 10000, iiOOO, 2000,1000 
1(00,200,100,1(0,20,10,11,2,1 



181 



Figure A2.3b - Ihc Topologic-1 Flovclrirt for Fip-j.re A2.3c 
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